The way forward for JakartaEE packages

Many of you might have read Mike Milinkovich’s post about the negotiations between the Eclipse Foundation and Oracle over the use of the javax.* package names.

https://eclipse-foundation.blog/2019/05/03/jakarta-ee-java-trademarks/

To give a short summary: Oracle is donating all the Code and IP they have. And this is a HUGE step for them, involving hundreds of developers and even more lawyers (hey, Oracle is a huge company, they do not do anything without a lawyer involved). So I also don’t want to buy into the blame game but I’m rather thankful that they did this!

One thing we all hoped would happen did not materialise though: Oracle doesn’t want us to change anything on packages starting with javax.*. That’s a pity but it’s not worth fighting.

In the following post I’d like to share my view about how we can go forward with JakartaEE from a purely technical perspective.

What options are on the table?

Since we are not allowed to change a bit of the signatures in javax.* we need to do it somewhere else. There has been a vote already quite some time ago. And this vote was in favour of using jakarta.* for new JakartaEE specs. So it make sense to also use this package for new features in existing specs.

Preface: All those options have some drawback and I’m still not sure myself what would be the best solution.

Option A: Extend as we need changes

In the first scenario we would keep all javax.* packages alive. And if we need to enhance some interface then we’d just extend it into a new interface in the jakarta.* namespace and add the methods and functionality there when new features get added.

For example if we look at the following class:

public interface javax.servlet.ServletRequest

In case we need some additional method in there we’d just extend this interface:

public interface jakarta.servlet.ServletRequest
  extends javax.servlet.ServletRequest {
    int someNewMethod();

The benefit would be that we do not need to touch servers. We also would be perfectly backward compatible – so no user code would need to change.

But of course it also comes with a few huge bummers:

1. We cannot add any additional attributes to annotations. And there is no extending of annotations in Java…  All we could do is to copy the annotation over to jakarta.* and enhance it there. And all frameworks have to lookup both annotations. This is really annoying.

2. Whenever we have interface hierarchies, then we it becomes slightly ugly. Think about javax.json.JsonValue and it’s sub interfaces like JsonStructure, JsonArray, JsonNumber, etc .
In this case an interface in jakarta.* would need to extend both the parent in jakarta and the corresponding javax class:

public interface jakarta.json.JsonNumber 
  extends javax.json.JsonNumber, jakarta.json.JsonValue 

3.) It becomes virtually impossible for abstract class hierarchies. Except I missed something.
In the old days (before default methods got introduced in Java8) abstract classes have been used instead of interfaces to allow adding methods without forcing users to implement them. We see this prominently in JSF for example in ResourceHandler.java and ResourceHandlerWrapper.java

javax.faces.application.ResourceHandlerWrapper extendsjavax.faces.application.ResourceHandler {

But how to map this to jakarta.*?

public abstract class 
  jakarta.faces.application.ResourceHandlerWrapper 
    extends javax.faces.application.ResourceHandlerWrapper 

and

public abstract class 
  jakarta.faces.application.ResourceHandler 
    extends javax.faces.application.ResourceHandler

But then jakarta.faces.application.ResourceHandlerWrapper is not an instance of jakarta.faces.application.ResourceHandler anymore 😦

Option B: Extend all interfaces and classes now

This is basically option A, but we do already extend all the interfaces in JakartaEE 8.
And of course they would for now all be empty.

The main benefit of this would be that users could migrate over all features from javax.* to jakarta.* without having to think about what is there and what not.

[See update 2019-05-07 for additional issues which popped up]

Option C: Replace all javax.* with jakarta.*

This is pretty much the most aggressive option. But it has some benefits as well. It would be a clean cut and while users would have to migrate, they only need to do it once. And the rules are really straight forward and clean. Just replace all javax.* imports with jakarta.*.

Of course this option also doesn’t come for free. in this cases it is not even the fault of JavaEE. There are still some JavaSE classes which reference classes which actually belong to JavaEE specifications. JSR-250 common-annotations (javax.annotation.*) is an example. Other nasty mix-ups are in the javax.security and java.sql area as shown in the following case:

public interface javax.sql.XAConnection extends PooledConnection { 
 javax.transaction.xa.XAResource getXAResource() throws SQLException;

Since javax.sql.XAConnection is not in JakartaEE but in SE we do not have any control over the return type. This will remain returning javax.tranxaction.xa.XAResource for some time…

What we might do is to do the reverse to Option A in such cases. By default we just migrate specs to jakarta.*. But in rare cases – as the one above – we extend the original javax types. That way we would at least allow to be upcast able.

How to migrate older applications?

Such a jakarta-only server could still serve old applications which use the javax.* APIs. In this case we would provide a javaagent which does a Class-transform and basically rewrites every JavaEE import to the corresponding JakartaEE import on the fly (in memory). Of course this is just a hack, but it might work out pretty fine.

We might also provide tools like a maven-plugin which dose the shade and transform during the build already.

Or containers also might op for doing this transformation when an app gets deployed to the server or on the first startup.

And what about the timeframe?

Initially it was concluded that JakartaEE 8 would be just a re-release of exact the same content as JavaEE 8.
With the only difference that it is now handled under the umbrella of the Eclipse Foundation, and not the JCP anymore.

Would this still hold true, then we would get a JakartaEE 8 with javax.* and JakartaEE 9 with jakarta.* packages. That might potentially be really confusing.

I personally would rather prefer to prepare the rename now and already ship JakartaEE 8 with the jakarta.* packages.

Final thoughts

All those ideas are just the beginning. We need to dig deeper, try ideas out and stick our heads together to come up with the best solution for everyone. There will most likely not be a perfect solution. But rest assured that we will not do stupid things!

Update 2019-05-07

I had a discussion with Emily Jiang on twitter during which I found one more potential problem for Option A and B: We must keep parameters for all interfaces which are expected to be implemented by users. Be it directly in form of e.g. javax.servlet.Servlet or javax.servlet.Filter or indirectly by having wrappers in user code.

Any input parameter would basically remain javax.* forever. Let’s look at an example. We migrate javax.servlet.Filter to jakarta.*:

public interface jakarta.servlet.Filter
  extends javax.servlet.Filter {..
     void doFilter(javax.servlet.ServletRequest request, 
                   javax.servlet.ServletResponse response,
                   javax.servlet.FilterChain chain) throws .. ;

I think it’s pretty obvious that we cannot change this signature, right? But on the other hand we might want to enhance the ServletRequest, so we’d need to introduce a jakarta.servlet.ServletRequest extends javax.servlet.ServletRequest. And since we cannot change the Filter method signature we’d have a big upcast party in all the code which touches it. Not a showstopper, but also certainly not nice.

Categorisation of APIs

I think we have a few different kind of APIs

  • Interfaces which must be implemented by the users.
  • Interfaces which are purely ‘used’, but never implemented in user code.
  • Interfaces which are mostly ‘used’, but sometimes wrapped in user code.

I’d say we should find a few of each and then play how they feel when assuming Option A, B and C before we decide which way to go.

Update 2019-05-10

I’ve already implemented quite a few spec APIs with the new package names (Option C style).

The APIs themselves are in the Apache Geronimo Spec Jakarta Branch Repository. This is work in progress and will be updated continuously.

I’ve also already migrated the core parts of the Apache OpenWebBeans CDI Container.

And finally we also already have an ok-ish branch for Apache Tomcat 10

It is certainly a lot of work to migrate to the new package space. But so far I did not hit any showstoppers!

To beans or not to beans.xml?

When to use a beans.xml file in your CDI application?

Today there was a discussion on Twitter whether beans.xml can simply be left away these days. This feature actually exists since the CDI-1.1 specification but results in a slightly different handling of your application.

META-INF/beans.xml present

In the old CDI-1.0 days one always had to add a META-INF/beans.xml file as a marker.

If a classpath entry has a META-INF/beans.xml file present then this is called an Explicit Bean Archive.

If we have an Explicit Bean Archive and no further bean-discovery-mode is defined, every class which is a candidate to be a bean will get picked up during class scanning and a ProcessAnnotatedType CDI Lifecycle event (PAT event) is being fired for it. There are not many classes which will not get handled that way. E.g. no PAT event will get fired for a non-static inner class. But you will even get a PAT for interfaces or abstract classes and enums for example!

And this feature (PAT for interfaces, etc) is often used via CDI Extensions to collect information or automatically register Custom Beans. E.g. the Apache DeltaSpike @MessageBundle mechanism is based on it.

Oh, there is another implication of this mode. Every class which gets scanned and is a potential bean candidate will automatically get registered as @Dependent scoped Bean. This is a part which I really don’t like because it fills up your BeanManager with mostly useless garbage. This is especially bad for jars where you have 100s of JavaSE classes but only a few CDI features on top of it. I will later show how to solve this problem.

JAR without any beans.xml

If not beans.xml is present (or the bean-discovery-mode is set to annotated) then only classes with a Bean Defining Annotation are picked up. In this case the ProcessAnnotatedType CDI Lifecycle event will also only get fired for those classes. And of course features like @MessageBundle will simply not work in those cases.

Trimmed Bean Archives

In CDI-2.0 we added a new feature. This is somewhat similar to the explicit bean archive as we know it from CDI-1.0 but adds a feature to not automatically pick up all classes as @Dependent scoped beans.

This mode is the trim mode and can be switched on with the following inside your beans.xml file:

<beans>
  <trim/>
</beans>

The effect is the following: Every class will get picked up by the class scanning and a ProcessAnnotatedType CDI lifecycle event is fired. After the event the AnnotatedType will get inspected. If the AnnotatedType has a Bean Defining Annotation it will get registered as Bean<T>. Otherwise it will simply get ignored in the further processing.

This gives us a neat way to still make all the fancy CDI Extension tricks and use PAT as a kind of business class scanning. While at the same time we do not overly pollute our BeanManager with information he will never need again.

Optional vs “if null”

Lately I see a lot of code like

Optional.ofNullable(i).orElse(x->doBla(x));

instead of a old known:

if (i != null) {
  doBla(i);
} 

It is debatable which style is easier to read. Especially when multiple layers are nested. It’s probably a matter of preference which style works better for you personally.

But what we can measure is the performance impact. Or since I do not have thus much time today let’s just quickly guess it.

Comparing the Source

The IF variant first:

NullFoo.java

public class NullFoo {
  public void test() {
    Integer i = 47;
    if (i != null ) {
      doSomething(i);
    }
  }

  private void doSomething(Integer i) {
    // actually we do nothing
  };
}

and now the functional version:

OptionalFoo.java


import java.util.Optional;

public class OptionalFoo {
  public void test() {
    Integer i = 47;
    Optional.ofNullable(i).ifPresent(x -> doSomething(x));
  }

  private void doSomething(Integer i) {
    // actually we do nothing
  };
}

The ByteCode

Judge about the performance yourself.
First the old style code as javap -c shows you:

~/tmp/delete/optionalfoo$>javap -c NullFoo.class
Compiled from "NullFoo.java"
public class NullFoo {
  public NullFoo();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."":()V
       4: return

  public void test();
    Code:
       0: bipush        47
       2: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       5: astore_1
       6: aload_1
       7: ifnull        15
      10: aload_0
      11: aload_1
      12: invokespecial #3                  // Method doSomething:(Ljava/lang/Integer;)V
      15: return
}

The important part is line 7: ifnull. This is a dog cheap operation. Any modern cpu can execute 2 or 4 such CJMPZ (compare and jump if zero) operations per clock cycle and core (especially if the target fits into the jump prediction cache).

Now let’s look at the functional style Java bytecode:

~/tmp/delete/optionalfoo$>javap -c OptionalFoo.class
Compiled from "OptionalFoo.java"
public class OptionalFoo {
  public OptionalFoo();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."":()V
       4: return

  public void test();
    Code:
       0: bipush        47
       2: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       5: astore_1
       6: aload_1
       7: invokestatic  #3                  // Method java/util/Optional.ofNullable:(Ljava/lang/Object;)Ljava/util/Optional;
      10: aload_0
      11: invokedynamic #4,  0              // InvokeDynamic #0:accept:(LOptionalFoo;)Ljava/util/function/Consumer;
      16: invokevirtual #5                  // Method java/util/Optional.ifPresent:(Ljava/util/function/Consumer;)V
      19: return
}

First we see a construtor invocation for Optional via invokestatic in line 2 followed by a push of the this pointer (aload_0).
Line 11 is a pretty expensive invokedynamic operation. It used to be way worse but even in the Java8 JVM invokedynamic is considerably more expensive than invokevirtual (simple method call).
And of course the call to the ifPresent method from our own code itself as well.

Note that there is a lot more going on in the following code parts (at least to a CJMPZ):

  • Code inside the Optional constructor
  • Code inside ifPresent

Fazit

Of course, if you do not invoke the code parts a million times per second then it probably will not matter. In which case you should use the style which is more readable (I personally still prefer the old null check).

But if you have to perform serious heavy lifting, then I suggest you benchmark your code with JMH. And you will most probably end up with the classic if statement. The code using a old-school nullcheck is about 200 times faster than the Optional variant.

Better test logs in parallel maven builds

Running Apache Maven in parallel

Apache Maven has a nice way to speed up builds by leveraging multiple CPU cores to build different modules in parallel.

$> mvn clean install -T8

will take 8 cores for your build.

More on parallel builds can be found here https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3

Failure analysis in Jenkins

But did you ever try to use parallel builds on a huge project when running in Jenkins (or locally with | tee some.log) and something did break?

You will end up with a big mess in your log file because all the output from multiple threads end up interweaved with each other. And you have no clue to detect which log line comes from which thread – and thus you also have no clue which log line comes from which module. Doesn’t exactly make finding failures easy.

How to separate the log lines?

What I did in many of my business project to get rid of this problem is the following little hack (I’m using testng, but it’s similar in JUnit):

@BeforeMethod(alwaysRun=true)
public final void markThreads() {
  Thread.currentThread.setName(
      Thread.currentThread.getId() + " - " + 
      this.getClass().getSimpleName());
}

You can provide this in a base class for all your unit tests for example.
You will now get the name of the test class for each log line since the thread name gets printed out be default in all log frameworks I know.

PS: with one could also switch surefire to generate a single file per test. But that slows down the build a bit and you end up with 2500 files. And it’s rather hard to catch side effects you got from previous tests leaving over state (SHOULD not be, but such bugs happen).

*update*

Romain Manni-Bucau just pinged me whether I used some special settings to make this work. Because by default maven doesn’t log the thread names.

And indeed he is correct. I totally forgot to point this out! In the maven logging config ${MAVEN_HOME}/conf/logging/simplelogger.properties I have enabled the following settings:

org.slf4j.simpleLogger.showThreadName=true

This will make Maven show you the thread names.

There is btw also another useful switch:

org.slf4j.simpleLogger.showDateTime=true

Which will additionally output the elapsed milli seconds for the whole build. This is really useful when hunting slow build parts.

toString(), equals() and hashCode() in JPA entities

Many users have generated toString, equals() and hashCode() methods in their JPA entities.
But most times they underestimate what impact that can have.

This blog post post is inspired by a chat I had with Gavin King and Vlad Mihalcea.

Preface: I like to emphase that I take a big focus on keeping the customer code portable across different JPA vendors. Some ‘Uber trick’ might work in one JPA vendor and totally mess up the others. Each JPA provider is broken in it’s own very special way. Trust me, I know what I am talking about from both a user and a vendor perspective… The stuff I show here is the least common denominator for JBoss Hibernate, EclipseLink and Apache OpenJPA. Please shout out if you think some of the shown code does not work on one of those JPA containers.

toString()

What’s wrong with most toString() methods in entities?
Well, most of the times developers just use the ‘generated toString’ shortcut to create this method. And that means that the generated toString() method usually just reads all the attributes of your entity and prints it.

What happens if you touch an attribute really depends in a high degree which ‘mode’ your JPA provider runs in. In Hibernate you often have the pure class. In that case not much will happen if you only read the attributes which are not Collections etc. By ‘using attributes’ I mean this.fieldname and not using getters like this.getFieldname(). Simply because Hibernate does not support lazy loading for any other fields in that mode. However, if you touch a @OneToMany or an @ElementCollection field then you will force lazy loading on the first time toString() gets invoked. It might also behave different if you use the getters instead of reading the attributes.

And if you use EclipseLink, Apache OpenJPA or even Hibernate in byte-code weaving mode or if you get a javassist proxy from Hibernate(e.g from em.getReference()) then you are in even deeper troubles. Because in that case touching the attributes might trigger lazy loading for any other field as well.

I tried to explain how the enhancement or ‘weaving’ works in JPA in a blog post many years ago https://struberg.wordpress.com/2012/01/08/jpa-enhancement-done-right/ Parts of it might nowadays work a tad different but the most basic approach should still be the same.

Note that OpenJPA will generate a toString() method for you if the entity class doesn’t have one. In that case we will print the name of the entity and the primary key. And since we know the state of the _loaded fields we will also not force generating a new PK if the entity didn’t already load one from the sequence.
According to Gavin and Vlad Hibernate doesn’t generate any toString(). I have no clue whether EclipseLink does.

For other JPA implementations than Apache OpenJPA I suggest you provide a toString which looks like the following

public String toString() {
    return this.getClass().getSimpleName() + "-" + getId();
}

And not a single attribute more.

equals() and hashCode()

This is where Vlad, Gavin and I really disagree.
My personal opinion is that you shall not write own equals() nor hashCode() methods for entities.

Vlad did write a blog post about equals() and hashCode() in the past https://vladmihalcea.com/2016/06/06/how-to-implement-equals-and-hashcode-using-the-entity-identifier/

As you can see it’s not exactly easy to write a proper equals() and hashCode() method for JPA entities. Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().
In any case, there is a point where Gavin, Vlad and I agree upon: generating equals() and hashCode() with IDEs is totally bollocks for JPA entities. It’s always broken to compare *all* fields. You would simply not be able to update your database rows 😉

IF you like to write a equals() method then compare the ids with a fallback on instance equality. And have the hashCode() always return zero as shown in Vlad’s blog.

Another way is to generated a UUID in the constructor or the getId() method. But this is pretty performance intense and also not very nice to handle on the DB side (large Strings as PK consume a lot more storage in the indexes on disk and in memory)

Using ‘natural IDs’ for equals()

That sounds promising. And IF you have a really good natural ID then it’s also a good thing. But most times you don’t.

So what makes a good naturalId? It must adhere to the following criteria:

  • it must be unique
  • it must not change

Sadly most natural IDs you think of are not unique. The social security number (SSN) in most countries? Hah, not unique! Really, there are duplicates in most countries…
Also often used in examples: the ISBN of a book. Toooo bad that those are not unique neither… Sometimes the same ISBN references different books, and sometimes the same book has multiple ISBNs assigned.

What about immutability? Sometimes a customer does not have a SSN yet. Or you simply don’t know it YET. Or you only know it further down the application process. So the SSN is null and only later get’s filled. Or you detect a collision with another person and you have to assign one of them a new SSN (that really happens more often than you think!). There is also the case where the same physical person got multiple SSN (happens more frequent as well).

Many tables also simply don’t have a good natural ID. Romain Manni-Bucau came up with the example of a Blog entry. What natural ID does a blog entry have? The date? -> Not unique. The title? -> can get changed later…

Why do you need equals() and hashCode() at all?

This is a good question. And my answer is: “you don’t !”

The argument why people think it’s needed for JPA entities is because e.g. having a field like:

@OneToMany 
private Set others;

A HashSet internally of course uses equals() and hashCode() but why do you need to provide a custom one? In my opinion the one you implicitly derive from Object.class is perfectly fine. It gives you instance-equality. And since per the JPA specification the EntityManager guarantees that you only get exactly one single entity instance for a row in the database you don’t need more. Doubt it? Then read the JPA specification yourself:

"An EntityManager instance is associated with a persistence context. A persistence context is a set of entity instances in which for any persistent entity identity there is a unique entity instance."

https://docs.oracle.com/javaee/7/api/javax/persistence/EntityManager.html

An exception where instance equality does not work is if you mix managed with detached entity instances. But that is something you should avoid at any cost as my following examples show.

Why you shouldn’t store managed and detached entities in the same Collection

Why would you do that? Instead of storing entities in a Set you can always use a Map. In that case you again don’t need any equals() nor hashCode() for the whole entity. And even then you might get into troubles.

One example is to have a ‘cache’.
Say you have a university management software which has a Course table. Courses get updated only a few times per year and only by some administrative people. But almost every page in the application reads the information. So what could be more reasonable as to simply store the Course in a shared @ApplicationScoped cache as Map for say an hour? Why don’t I use the cache management provided with some JPA containers? Many reasons. First and foremost they are not portable. They are also really tricky to configure (I’m talking about real production, not a sample app!). And you like to have FULL control over the cache!

So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed. All is fine as long as you only run it locally and click around on your app and only do unit tests. But under heavy load in production (our app had 5 Mio page hits/day average) you will hit the following problem:

The JPA specification does not allow an EntityManager to be used from multiple threads at the same time. As a managed entity is bound to an EntityManager, this limitation also affects the entities themselves.
So while you do the em.find() and later a coursesCache.put(courseId, course) the entity is still in ‘managed’ mode! And under heavy load it *will* happen that another user gets the still managed entity from the cache before it got detached (which happens at the tx commit or request end, depending on your setup). Boooommm it goes…

How can you avoid that? Simply use a view object. Normally the full database entities with all their gory attribute details and sub-tables are not needed on an overview course list anyway. So you better use a ‘new’ query:

CourseListVO couseViewItem 
  = em.createQuery("SELECT NEW org.myproject.Course(c.id, c.name, c.,...) " +
      " FROM Course AS c WHERE...");
cache.put(courseId, courseViewItem);

By using a ‘new Query’ you will get instances which are not managed by the container. And it’s also much faster and consumes less memory btw.

Oh I’m sure there are things which are still not cosidered yet…

PS: this is not an easy topic as you might be able to judge from looking at the involved people. Gavin is the inventor of Hibernate and JPA, Vlad is the current Hibernate maintainer. And I was involved in the DODS DB layer of Lutris Enhydra in the 90s and am a long time Apache OpenJPA committer (and even the current PMC chair).

Applying interceptors to producer methods

Interceptors are really cool if you have a common problem and like to apply it to without making every single colleague copy the same code over again and again to apply a solution over the whole code base.

In my case it was the urge to log out SOAP and REST invocations to other systems. I also like to add a logCorrelationId via HTTP header to each outgoing SOAP call. You can read more about the background over in my other logCorrelation blog post.

I’ll focus on integrating SOAP clients, but you can easily do the same for REST clients as well.

Integrating a SOAP client in an EE project

Usually I create a CDI producer for my SOAP ports. That way I can easily mock them out with a local dummy implementation by just using CDI’s @Specializes or @Alternative. If you combine this with with Apache DeltaSpike @Exclude and the DeltaSpike Configuration system then you can even even enable those Mock via ProjectStage or a configuration setting.

Consider you have a WSDL and you create a SOAP client with the interface CustomerService.

What we like to get from a ‘consumer’ perspective is the following usage:

public class SomeFancyClass {
  private @Inject CustomerService customerService;
  ...
}

Which means you need a CDI producer method, e.g. something like:

@ApplicationScoped
public class CusomerServiceSoapClientProducer {
  @ConfigProperty(name = "myproject.customerService.endpointUrl")
  private String customerServiceEndpointUrl;

  @Produces
  @RequestScoped
  @LogTiming
  public CustomerService createSoapPort() {
    // generated from the WSDL, e.g. via CXF
    CustomerServiceService svc = new CustomerServiceService();
    CustomerServiceServicePort port = svc.getCustomerServiceServicePort();

    // this sets the endpoint URL during producing.
    ((BindingProvider) port).getRequestContext().
           put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, customerServiceEndpointUrl);

    return port;
  }
}

Side note: the whole class could also be @RequestScoped to get the endpoint URL evaluated on every request. We could of course also use the DeltaSpike ConfigResolver programmatically to gain the same. But the whole point of setting the endpoint URL manually is that we don’t need to change the WSDL and have to recompile the project on every server change. We can also use different endpoints for various boxes (test vs production environments, or different customers) that way.

What is this @LogTiming stuff?

Now it becomes interesting! We now have a SOAP client which looks like a regular CDI bean from a ‘user’ point of view. But we like to get more information about that outgoing call. After all it’s an external system and we have no clue how it behaves in terms of performance. That means we like to protocol each and every SOAP call and log out it’s duration. Of course since we not only have 1 SOAP service client but multiple dozen ones we like to do this via an Interceptor!

@Inherited
@InterceptorBinding
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD, ElementType.TYPE})
public @interface LogTiming {
}

Applying an Interceptor on a producer method?

Of course the code written above DOES work. But it behaves totally different as many of you will guess.
If you apply an interceptor annotation to a producer method, then it will not intercept the calls to the produced bean!
Instead it will just intercept the invocation of the producer method. A producer method gets invoked when the Contextual Instance gets created. For a @Produces @RequestScoped annotated producer method this will happen the first time a method on the produced CDI bean gets called in the very request (or thread for non-servlet request based threads). And exactly this call gets intercepted.

If we would just apply a stopwatch to this interceptor then we would get the info about how long it took to create the soap client. That’s not what we want! We like to get the times from each and every usage of that CustomerService invocation! So what does our LogTiming interceptor do?

Proxying the Proxy

The trick we apply is to to use our LogTiming Interceptor to wrap the produced SOAP port in yet another proxy. And this proxy logs out the request times, etc. As explained before we cannot use CDI interceptors, but we can use java.lang.reflect.Proxy!:

@LogTiming
@Interceptor
public class WebserviceLoggingInterceptor {

    @AroundInvoke
    private Object wrapProxy(InvocationContext ic) throws Exception {
        Object producedInstance = ic.proceed();
        Class[] interfaces = producedInstance.getClass().getInterfaces();
        Class<?> returnType = ic.getMethod().getReturnType();
        return Proxy.newProxyInstance(ClassUtils.getClassLoader(null), interfaces, new LoggingInvocationHandler(producedInstance, returnType));
    }
}

This code will register our reflect Proxy in the CDI context and each time someone calls a method on the injected CustomerService it will hit the LogInvocationHandler. This handler btw can also do other neat stuff. It can pass over the logCorrelationId (explanation see my other blog post linked above) as HTTP header to the outgoing SOAP call.

The final LoggingInvocationHandler looks like the following:

public class LoggingInvocationHandler implements InvocationHandler {
    private static final long SLOW_CALL_THRESHOLD = 100; // ms
 
    private final Logger logger;
    private final T delegate;

    public LoggingInvocationHandler(T delegate, Class loggerClass) {
        this.delegate = delegate;
        this.logger = LoggerFactory.getLogger(loggerClass);
    }

    @Override
    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        if (EXCLUDED_METHODS.contains(method.getName())) {
            // don't log toString(), hashCode() etc...
            return method.invoke(delegate, args);
        }

        long start = System.currentTimeMillis();

        try {
            // setting log correlation header if any logCorrelationId is set on the thread.
            String logCorrelationId = LogCorrelationUtil.getCorrelationId();
            if (StringUtils.isNotEmpty(logCorrelationId) && delegate instanceof BindingProvider) {
                BindingProvider port = (BindingProvider) delegate;
                Map<String, List> headers = (Map<String, List>) port.getRequestContext().get(MessageContext.HTTP_REQUEST_HEADERS);
                if (headers == null) {
                    headers = new HashMap<>();
                }
                headers.put(LogCorrelationUtil.REQUEST_HEADER_CORRELATION_ID, Collections.singletonList(logCorrelationId));
                port.getRequestContext().put(MessageContext.HTTP_REQUEST_HEADERS, headers);
            }

            // continue with the real call
            return method.invoke(delegate, args);
        }
        finally {
            long duration = System.currentTimeMillis() - start;
            if (duration <= SLOW_CALL_THRESHOLD) {
                logger.info("soapRemoteCall took={} ms service={} method={}", duration, delegate.getClass().getName, method.getName());
            }
            else {
                // log a more detailed msg, e.g. with params
            }
        }
    }

Limitations

Of course this trick only works if the producer method returns an interface! That’s caused by the reflect Proxies are only available for pure interfaces.

I’m trying to remove this limitations by bringing intercepetors for produced instances to CDI-2.0 as well on working on a interceptors spec change to introduce ways to create subclassing proxies as easy as interface proxies. Stay tuned!

What is LogCorrelation?

While working on an article about CDI interceptors on producer methods I mention logCorrelation. I will not go into detail on this topic over at the other blog post as it would be simply too much over there. And this gives a great topic for a separate post anyway. And here we go…

So what is LogCorrelation?

Consider you have a more or less distributed application topology. You might have a server which does maintain customer data. There might be another box handling all the document archive, another one which holds the calculation kernel, etc etc.

Nowadays all people would say that are microservices. 8 years ago all people called it SOA. To be honest I GIVE A SHIT how some sales people name it as all this is around since much longer than I’m working in the industry (which is a whoopie 25 years already). It’s just modular applications talking with each other somehow. Sometimes via SOAP or REST, but maybe even via MessageQueue, shared database tables or file based with batches handling the passing over – to me it doesn’t matter much.

But for all those the problem is the same: Consider a user clicks on some button in his browser or fat client. This triggers an ‘application action’. And this single action might hit the first server, then this server pings another one, etc. Synchronous or asynchronous also doesn’t matter. This might go all over the place in your company and even externally.  At the end something happens and the user most times gets some response. And it is really, REALLY hard to tell what’s wrong and where it went wrong if something doesn’t work as expected or returns wrong results. Often you don’t even have a clue which servers were involved. And if your whole application starts to behave ‘laggy’ then you will have a hard time judging which system you need to tinker with.

Now how cool would it be if you could follow this single action over all the involved servers?

And this is exactly what logCorrelation does!

What is the trick?

The trick is really plain simple. Each ‘action’ gets an own unique logCorrelationId. That might be a UUID for example. The only requirement is that it’s ‘decently’ unique.

If a server gets a request then he checks if there was a logCorrelationId passed to him. If so, then he takes this id and stores it in a ThreadLocal. If there was no id passed, then this is a ‘new’ action and we generate a fresh logCorrelationId. Of course this logCorrelationId will also get set as e.g. HTTP header for all subsequent outgoing HTTP calls on this very thread.

Where do I show the logCorrealationId?

Our applications now all handle this logCorrelationId properly, but where can I look at it? What is the benefit of all this?

At my customers I mainly use Apache log4j as logging backend, (often with slf4j as API). The point is that only log4j (and logback, but with way worse performance) support a nice little feature called MDC which stands for Mapped Diagnostic Context.  It is basically a ThreadLocal<Map<String, String>> which will get logged out in each and every line you log out on this very thread.

This log4j feature can also be accessed via the slf4j API. E.g. in a javax.servlet.Filter

MDC.set("correlationId", logCorrelationId);
MDC.set("sessionId", httpSessionId);
MDC.set("userId", loggedInUser.getId());

For enabling it in the log output you need to configure a ConversionPattern:

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
    <appender name="console" class="org.apache.log4j.ConsoleAppender">
        <param name="Target" value="System.out"/>
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{ISO8601} [%t] %X{sessionId} %X{userId} %X{correlationId} %-5p %c{2} %m%n"/>
        </layout>
    </appender>

If you logging is configured properly in your company and you funnel all back into log aggregation systems like ELK (OpenSource with commercial support offering) or Splunk (Commercial with limited free offering) then you can now simply follow a single action over all the various systems.

What about non-user requests?

Non user requests can sometimes even be filled with more information. At a few customers we use Camunda BPMN Suite (OpenSource with commercial support). The core has a Thread which basically polls the DB and fetches new tasks to execute from it. Those will then get ‘executed’ in a parallel thread. For those threads we intercept the Executor and fill the logCorrelationId with the camunda jobId which basically is a uuid starting with ‘cam-‘. So once a process task blows up we can exactly figure what went wrong – even on a different server.

Of course this trick is not limited to the process engine…

PS: how does my stuff look like?

Probably it’s also worth sharing my LogCorrelationUtil:

/**
 * Helper for log correlation.
 *
 * @author Mark Struberg
 */
public class LogCorrelationUtil {

    public static final String REQUEST_HEADER_CORRELATION_ID = "X_LOG_CORRELATION_ID";
    public static final String MDC_CORRELATION_ID = "correlationId";

    private LogCorrelationUtil() {
    }

    /**
     * Creates a new log correlation Id ONLY IF there is no existing one!.
     * Takes about 4 uS, because I use a much faster UUID algo
     *
     * @param logEnvironment prefix for the logCorrelationId if a new one has to be created. Determines the environment the uuid got created in.
     * @param existingLogCorrelationId oder {@code null} if there is none yet
     */
    public static String enforceLogCorrelationId(LogEnvironment logEnvironment, String existingLogCorrelationId) {
        if (existingLogCorrelationId != null && existingLogCorrelationId.length() > 0) {
            return existingLogCorrelationId;
        }
        ThreadLocalRandom random = ThreadLocalRandom.current();
        String uuid = new UUID(random.nextLong(), random.nextLong()).toString();

        if (logEnvironment != null) {
            StringBuilder sb = new StringBuilder(60);
            sb.append(logEnvironment);
            sb.append(uuid);
            uuid = sb.toString();
        }
        return uuid;
    }

    /**
     * @return the logCorrelationId for this thread or {@code null}
     */
    public static final String getCorrelationId() {
        return MDC.get(MDC_CORRELATION_ID);
    }

    /**
     * Set the given logCorrelationId for the current Thread.
     */
    public static final void setCorrelationId(String logCorrelationId) {
        MDC.put(MDC_CORRELATION_ID, logCorrelationId);
    }

    /**
     * Clears the logCorrelationId from the current Thread.
     * This method MUST be called at the end of each request 
     * to prevent mem leaks!
     */
    public static final void clearCorrelationId() {
        MDC.remove(MDC_CORRELATION_ID);
    }
}

Being ‘unstoppable’ – a Batchlets tale

How to stop a JBatch Batch

JSR-352 (JBatch) is a great specification. It takes care of many situations a user don’t think about most times. For example how to stop a Batch. But sometimes it cannot relief you from putting some brain into it.

How to trigger a Batch stop?

The JobOperator has a method to stop a specific execution: JobOperator#stop(long executionId).

Of course the JobOperator will not immediately kill the worker thread with this batch but tries to gracefully shut down the Batch.

Stopping a ‘Chunk Step’

First, what is a ‘Chunk’? A chunk is a batch <step> which consists of an ItemReader an optional ItemProcessor and an ItemWriter. A ‘chunk’ defines the transaction size of the processing. Let’s consider a chunk-size of 10. This means that our step processes 10 items and then commits all of them in a single commit.

The processing order is as following
ItemReader, ItemProcessor, ItemReader, ItemProcessor,… until we did read and process our 10 items. After that all the 10 items will get handed over to the ItemWriter to store them somewhere. After that a commit happens and the loop starts over with the next items.

If you call JobOperator#stop(executionId) for a Chunk Step then the loop which invokes the ItemReader, ItemProcessor and ItemWriter will continue with reading and processing the current Item and then hand over all the currently chained Items to the ItemWriter. After that the loop exits gracefully.

That’s nice and clean! But what about Batchlets?

Stopping a ‘Batchlet’

There is a good reason why I write this post today. In the last few weeks we had a few Batchlets which didn’t behave ‘nice’ to our ops team. Those beasts didn’t want to stop working! Of course the problem only occurred in production and not in all the tests done before. Simply because in production we have millions of items to process whereas in the test they just fed in a a few thousand items.

So why didn’t those Batchlets stop?

First we have to understand what a Batchlet is. Opposite to a Chunk Step a Batchlet is a ‘do-it-yourself’ thingy. The JBatch runtime really hands over all the control to your code. It doesn’t even do Transactions for you! It is really all in your hands. Usually such a batchlet contains a processing loop as well:

public class MyBatchlet extends AbstractBatchlet {
  @Override
  public String process() throws Exception {
    List items = readAllItemsToProcess();
    for (MyItem item : items) {
      processAndStoreMyItem();
    }
    return "OK";
  }
}

That’s nice…. but won’t stop for you…

So what is missing? Yes, AbstractBatchlet implements an empty stop() method. And this is often a bad idea…

Our code should better look somehow like the following:

public class MyBatchlet implements Batchlet {
  private volatile boolean shouldStop = false;

  @Override
  public void stop() {
    shouldStop = true;
  }

  @Override
  public String process() throws Exception {
    List items = readAllItemsToProcess();
    for (MyItem item : items) {
      processAndStoreMyItem();
      if (shouldStop) {
        return "STOPPING";
      }
    }
    return "OK";
  }
}

There are a few important details:
1.) the boolean shouldStop field really needs to be volatile. That is because the stop() method gets called from a different thread and otherwise the new value might not be visible to the worker thread. Read up more on volatile over at the excellent Angelika Langers Java Memory Model talk.

2.) I’m thinking about preventing the usage of “extends AbstractBatchlet” via a checkstyle rule. It’s actually not worth having this AbstractBatchlet. People should be aware that they missed the stop() functionality!

JTA vs resource-local performance

A few years ago I did a simple test to compare how JTA handling (via UserTransaction) compares to resource local transaction handling. Back then using JTA had a rather big impact on the performance. Time to repeat this test with a modern EE server.

So I went on and created a very simple JPA sample which does a loop and creates 1000 Customer entries in a simple h2 memdb. I did choose h2 memdb because it’s pretty fast. At least much faster than any production ready DB which stores the stuff on a disk. In the end we like to know the performance of JTA and not bench the database.

My simple sample can be downloaded at https://github.com/struberg/jtabench
To start it just run
$> mvn clean install tomee:run

So far my tests don’t show a huge problem.

When I run the benchmark against the resource-local part (http://localhost:8080/jtabench/customer/nonjta) I get
Resource-Local: 21.6 pages/second.
That means 21600 inserts per second.

If I do the very same benchmark against the JTA part (http://localhost:8080/jtabench/customer/jta) I get about
JTA: 19.0 pages/second.
And please remember that h2 memdb is really fast! Thus with a real database load the difference will simply be negligible.

If you reproduce the test yourself locally then don’t forget to clean the databases inbetween benchmark runs with http://localhost:8080/jtabench/customer/reset . This will delete all temporarily created Customer entries in the dbs.

Note: I’m not quite sure how much optimization geronimo-tx applies if there is only a single DataSource involved. Need to dig that myself. Probably will provide a follow up test with 2++ different databases…

The (mostly) unknown story behind javax.ejb.EJBException

Yesterday I blogged about what impact Exceptions do have on JavaEE transactions in EJB.
But there is another funny EJB Exception mechanism waiting for you to get discovered – the javax.ejb.EJBException.

This handling dates back to the times when EJB was mainly intended to be a competition to NeXT Distributed Objects and Microsoft DCOM. Back in those days it was all about ‘Client-Server Architecture’ and people tried to spread the load to different servers on the network. A single EJB back then needed 4 classes and was inherently remote by default.

Only much later EJBs got a ‘local’ behaviour as well. And only in EJB-3.1 the No-Interface View (NIV) got introduced which made interfaces obsolete and are local-only.
But for a very long time remoting was THE default operation mode of EJBs. So all the behaviour was tailored around this – regardless whether you are really using remoting or are running in the very same JVM.

The impact of remoting

The tricky situation with remote calls is that you cannot be sure that every class is available on the client.

Imagine a server which uses JPA. This might throw a javax.persistence.EntityNotFoundException. But what if the caller – a Swing EJB client app – doesn’t have any JPA classes on it’s classpath?
This will end up in a ClassNotFoundException or NoClassDefFoundException because de-serialisation of the EntityNotFoundException will blow up on the client side.

To avoid this from happening the server will serialize a javax.ejb.EJBException instead of the originally thrown Exception in certain cases. The EJBException will contain the original Exceptions stack trace as pure Strings. So you at least have the information about what was going wrong in a human readable format.

If you like to read up the very details then check out 9.4 Client’s View of Exceptions in the EJB specification.

Switching on the ‘Auto Pilot”

Some containers like e.g. OpenEJB/TomEE contain a dual-strategy. We have a ‘container’ (ThrowableArtifact) which wraps the orignal Throwable plus the String interpretation and sends both informations as fallback over the line.

On the client side the de-serialization logic of ThrowableArtifact first tries to de-serialize the original Exception. Whenever this is possible you will get the originally thrown Exception on the client side. If this didn’t work then we will use the passed information and instead of the original Exception we throw an EJBException with the meta information as Strings.

The impact on your program?

The main impact for you as programmer is that you need to know that you probably not only need to catch the original Exception but also an EJBException. So this really changes the way your code needs to be written.
And of course if you only got the EJBException then you do not exactly know what was really going on. If you need to react on different Exceptions in different ways then you might try to look it up in the exception message but you have no type-safe way anymore. In that case it might be better to catch it on the server side and send an explicit @ApplicationException over the line.

When do I get the EJBException and when do I get the original one?

I’d be happy to have a good answer myself 😉

My experience so far is that it is not well enough specified when each of them gets thrown. But there are some certain course grained groups of container behaviour:

  • Container with Auto-Pilot mode; Those containers will always try to give you the original Exception. And only if it is really not technically possible will give you an EJBException. E.g. TomEE works that way.
  • Container who use the original Exception for ‘local’ calls and EJBException for remote calls.
  • Container who will always give you an EJBException – even for local invocations. I have not seen those for quite some time though. Not sure if this is still state of the art?

Any feedback about which container behaves what way is welcome. And obviously also if you think there is another category!

Transaction handling in EJBs and JavaEE7 @Transactional

Handling transactions in EJBs is easy, right? Well, in theory it should be. But how does the theory translate into reality once you leave the ivory tower?

I’ll show you a small example. Let’s assume we have 2 infrastructure service EJBs:

@Stateless
public class StorageServiceImpl implements StorageService {
  private @EJB CustomerService customerService;
  private @PersistenceContext EntityManager em;

  public void chargeStorage(int forYear) throws CustomerNotFoundException {
    storeNiceLetterInDb(em);
    Customer c = customerService.getCurrentCustomer(); 
    doSomethingElseInDB(); 
  }
} 

And now for the CustomerService which is an EJB as well:

@Stateless
public class CustomerServiceImpl implements CustomerService {
  public Customer getCurrentCustomer() throws CustomerNotFoundException {
    // do something if there is a current customer
    // otherwise throw a CustomerNotFoundException
  }
}

The Sunshine Case

Let’s first look at what happens if no problems occur at runtime.

In the normal operation mode some e.g. JSF backing bean will call storageService.chargeService(2015);. The implicit transaction interceptor will use a TransactionManager (all done in the interceptor which you do not see in your code) to check whether a Transaction is already open. If not it will open a new transaction and remember this fact. The same check will happen in the implicit transaction interceptor for the CustomerService.

When leaving CustomerService#getCurrentCustomer the interceptor will recognize that it didn’t open the transaction and thus will simply return. Otoh when leaving StorageService#chargeStorage it’s interceptor will commit the transaction and close the EntityManager.

Broken?: Handling checked Exceptions

Once we leave the sunny side of the street and hit some problems the whole handling start to become messy. Let’s look what happens if there is a checked CustomerNotFoundException thrown in CustomerService#getCurrentCustomer. Most people will now find their first surprise: The database changes done in storeNiceLetterInDb() will get committed into the database.

So we got an Exception but the transaction still got committed? WT*piep*!
Too bad that this is not a bug but the behaviour is exactly as specified in “9.2.1 Application Exceptions” of the EJB specification:

An application exception does not automatically result in marking the transaction for rollback unless the ApplicationException annotation is applied to the exception class and is specified with the rollback element value true…

So this means we could annotate the CustomerNotFoundException with @javax.ejb.ApplicationException(rollback=true) to force a rollback.
And of course we need to do this for ALL checked exceptions if we like to get a rollback.

Broken?: Handling unchecked Exceptions

The good news upfront: unchecked Exceptions (RuntimeExceptions) will usually cause a rollack of your transaction (unless annotated as @AppliationException(rollback=false) of course).

Let’s assume there is some other entity lookup in the code and we get a javax.persistence.EntityNotFoundException if the address of the customer couldn’t be found. This will rollback your transaction.

But what can we do if this is kind of expected and you just like to use a default address in that case? The natural solution would be to simply catch this Exception in the calling method. In our case that would be a try/catch block in StorageServiceImpl#chargeStorage.

That’s a great idea – but it doesn’t work in many containers!

Some containers interpret the spec pretty strictly and do the Exception check on _every_ layer (EJB spec 9.3.6) . And if the interceptor in the CustomerService detects an Exception then the implicit EJB interceptor will simply roll back the whole transaction and mark it as “rollbackOnly”. Catching this Exception in an outer level doesn’t help a bit. You will not get your changes into the database. And if you try to do even more on the database then you will blow up again with something like “The connection was already marked for rollback”.

And how is that with @javax.transaction.Transactional?

Basically the same like with EJBs. In my opinion this was a missed chance to clean up this behaviour.
You can read this up in chapter 3.6 of the JTA-1.2 specification.

The main difference is how to demarcate rollback vs commit exceptions. You can use the rollbackOn and dontRollbackOn attributes of @Transactional:

@Transactional(rollbackOn={SQLException.class}, dontRollbackOn={SQLWarning.class})

Now what about DeltaSpike @Transactional?

In Apache DeltaSpike @Transactional and it’s predecessor Apache MyFaces CODI @Transactional we have a much cleaner handling:

Exceptions only get handled on the layer where the transaction got opened. If you catch an Exception along the way than we do not care about it.

Any Exception on the outermost layer will cause a rollback of your transaction. It doesn’t matter if it is a RuntimeException or a checked Exception.

If there was no Exception in the outermost interceptor then we will commit the transaction.

PS: please note that I explicitly used interfaces in my samples. Otherwise you will get NIV (No Interface View) objects which again might behave slightly different as they use a totally different proxying technique and default behaviour. But that might be enough material for yet another own blog post.
PPS: I also spared you EJBs with TransactionManagementType.BEAN. That one is also pretty much non-portable by design as you effectively cannot nest them as it forces you to either commit or rollback the tx on every layer. Some containers work fine while others really force this.

The right Scope for JBatch Artifacts

In my recent JavaLand conference talk about JSR-352 JBatch and Apache BatchEE I briefly mentioned that JBatch Artifacts should have a scope of @Dependent (or Prototype scope if you are using Spring). Too bad there was not enough time to dig into the problem in depth so here comes the detailed explanation.

What is a JBatch Artifact

A JBatch batch needs a Job Specification Language XML file in META-INF/batch-jobs/*.xml files. These files describes how your batch job is built up.

Here is a small example of how such a batch JSL file could look like

<job id="mysamplebatch" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
  <step id="mysample-step">
    <listeners>
      <listener ref="batchUserListener" >
      <properties>
        <property name="batchUser" value="#{batchUser}"/>
      </properties>
      </listener>
    </listeners>
    <batchlet ref="myWorkerBatchlet">
      <properties>
        <property name="inputFile" value="#{inputFile}"/>
      </properties>
    </batchlet>
  </step>

In JSR-352 an Artifact are all pieces which are defined in your JBatch JSL file and get requested by the container. In the sample above this would be batchUserListener and myWorkerBatchlet.

The following types can be referenced as Batch Artifacts from within your JSL:

  • Batchlets
  • ItemReader
  • ItemProcessor
  • ItemWriter
  • JobListener
  • StepListener
  • CheckpointAlgorithm
  • Decider
  • PartitionMapper
  • PartitionReducer
  • PartitionAnalyzer
  • PartitionCollector

The Batch Artifact Lifecycle

The JBatch spec is actually pretty clear what lifecycle needs to get applied on Batch Artifacts:

11.1 Batch Artifact Lifecycle
All batch artifacts are instantiated prior to their use in the scope in which they are declared in the Job XML and are valid for the life of their containing scope. There are three scopes that pertain to artifact lifecycle: job, step, and step-partition.
One artifact per Job XML reference is instantiated. In the case of a partitioned step, one artifact per Job XML reference per partition is instantiated. This means job level artifacts are valid for the life of the job. Step level artifacts are valid for the life of the step. Step level artifacts in a partition are valid for the life of the partition.
No artifact instance may be shared across concurrent scopes. The same instance must be used in the applicable scope for a specific Job XML reference.

The problem is that whenever you use a JavaEE artifact then you might get only a proxy. Of course the reference to this proxy gets thrown away correctly but the instance behind the proxy might survive. Let’s look at how this works internally.

How Batch Artifacts get resolved

A JBatch implementation can provide it’s own mechanism to load the artifacts. This is needed as it is obviously different whether you use BatchEE with CDI or if you use Spring Batch (which also implements JSR-352).
In general there are 3 different ways you can reference a Batch Artifact in your JSL:

  1. Via a declaration in an optional META-INF/batch.xml file. See the section 10.7.1 of the specification for further information.
  2. Via it’s fully qualified class name.
    In BatchEE we first try to get the class via BeanManager#getBeans(Type) and BeanManager#getReference. If that doesn’t give us the desired Contextual Reference (CDI proxy) then we simply call ClassLoader#loadClass create the Batch Artifact with newInstance() and apply injection into this instance
  3. Via it’s EL name. More precisely we use BeanManager#getBeans(String elName) plus a call to BeanManager#getReference() as shown above.

We now know what a Batch Artifact is. Whenever you are on a JavaEE Server you will most likely end up with a CDI or EJB Bean which got resolved via the BeanManager. If you are using Spring-Batch then you will most times get a nicely filled Spring bean.

The right Scope for a Batch Artifact

I’ve seen the usage of @javax.ejb.Stateless on Batch Artifacts in many samples. I guess the people writing such samples never used JBatch in real production yet 😉 Why so? Well, let’s look at what would happen if we implement our StepListener as stateless EJB:

@javax.ejb.Stateless
@javax.inject.Named // this makes it available for EL lookup
public class BatchUserListener implements StepListener {
  @Inject 
  @BatchProperty
  private String batchUser;

  @Override
  public void beforeStep() throws Exception {
     setUserForThread(   
  }

  @Override
  public void afterStep() throws Exception {
    clearUserForThread();
  }
}

Now let’s assume that the BatchUserListener gets not only used in my sample batch but in 30 other batches of my company (this ‘sample’ is actually taken from a HUGE real world project where we use Apache BatchEE since over a year now).

What will happen if e.g. a ‘DocumentImport’ batch runs before my sample batch? The first batch who uses this StepListener will create the instance. At the time when the instance gets created by the container (and ONLY at that time) it will also perform all the injection. That means it will look up the ‘batchUser’ parameter and injects it into my @BatchProperty String. Let’s assume this DocumentImport batch uses a ‘documentImportUser’. So this is what we will get injected into the ‘batchUser’ variable;

Once the batch step is done the @Stateless instance might be put back into some pool cache. And if I’m rather unlucky then exactly this very instance will later get re-used for mysample-step. But since the listener already exists there will no injection be performed on that instance. What means that the steplistener STILL contains the ‘documentImportUser’ and not the ‘mySampleUser’ which I explicitly did set as parameter of my batch.

The very same issue also will happen for all injected Variables which do not use proxies, e.g.:

  • @Inject StepContext
  • @Inject JobContext

TL;DR: The Solution

Use @Dependent scoped beans for your Batch Artifacts and only use another scope if you really know what you are doing.

If you like to share code across different items of a Step or a Job then you can also use BatchEE’s @StepScoped and @JobScoped which is available through a portable BatchEE CDI module

CDI in EARs

Foreword

I was banging my head against the wall for the last few days when trying to solve a few tricky issues we saw with EAR support over at Apache DeltaSpike. I’m writing this up to clean my mind and to share my knowledge with other EAR-pig-wrestlers…

The EAR ClassLoader dilemma

EARs are a constant pain when it comes to portability amongst servers. This has to do with the fact that JavaEE still doesn’t define any standards for visibility. There is no clear rule about how the ClassLoaders, isolation and visibility has to be set up. There is just as single paragraph (JavaEE7 spec 8.3.1) about which classes you might probably see.

There are 2 standard ClassLoader setups we see frequently in EARs.
For the sake of further discussion we assume an EAR with the following structure:

sample.ear
├── some-ejb.jar
├── lib
│   ├── some-shared.jar
│   └── another-shared.jar
├── war1.war
│   └── WEB-INF
│       ├── classes 
│       └── lib
│           └── war1-lib.jar
└── war2.war
    └── WEB-INF
        ├── classes 
        └── lib
            └── war2-lib.jar

Flat ClassLoader

The whole EAR is served by just a single flat ClassLoader. If you have 2 WARs inside your application then the classes in each of it can see each other. And also the classes in the shared EAR lib folder can see. This is e.g. used in JBoss-4 (though not from the beginning). You can also configure most of the other containers to use this setup for your EAR. But nowadays it’s hardly a default anymore (and boy is that good!)

Hierarchic ClassLoader

This is the setup used by most containers these days – but it’s still not a mandated by the spec! The Container will provide a shared EAR Application ClassLoader which itself only contains the shared ear libs. This is the parent ClassLoader of all the ClassLoaders in the EAR. Each WAR and ejb-jar inside your EAR will get an own child WebAppClassLoader.

This means war2 doesn’t see any classes or resources from war1 and the other way around. It further means that the shared libs do not see the classes of war1 nor war2, etc!

If you need some caches in your shared libs, then you need to rely on the ThreadContextClassLoader (TCCL) as outlined in JavaEE 7 paragraph 8.2.5. While this section is about “Dynamic Class Loading” it also is valid for caches and storing other dynamic information in static variables. Otherwise you end up mixing values from war1 and probably re-use them in war2 (where you will get a ClassNotFound exception). Even if your 2 WARs contain the same jar (e.g. commons-lang.jar) the Class instances are different as they come from a different ClassLoader. If you store those in a shared-lib jar then you most probably end up with the (in)famous “Cannot cast class Xxx to class Xxx”.

One common solution to this problem will look something like:

public class MyInfoStore {
  private Map<ClassLoader, Set<Info>> infoMap;

  public void storeInfo(Info info) {
    ClassLoader tccl = Thread.currentThread().getContextClassLoader(); // probably guarded for SecurityManager
    infoMap.put(tccl, info);
  }
...
}

Of course you must really be careful to clean up this Map during shutdown. In CDI Applications you can use the @Observes BeforeShutdown event to trigger the cleanup.

The impact on us programmers

These various scenarios make it really hard to write any form of portable framework who runs fine inside of EARs. This is not only true for client frameworks but also for container frameworks itself like CDI and Spring.

Integrating CDI containers in EARs

It is pretty obvious that – mostly due to the lack of a guaranteed default isolation scenario in JavaEE – CDI containers have a hard time in finding a nice and portable handling of CDI beans and Extensions in EARs. I got involved in CDI in late 2008 when the name of the spec still was WebBeans. And that name was taken literally – it originally was only targetting web applications and not EARs. The EAR support only got roughly added (according to some interpretations) shortly before the EE-6 specification got published. So there are multiple reasons why CDI in EARs is not really a first class citizen yet.

To my knowledge there are 3 sane ways how a container can integrate CDI in ears. And of those 3 sane ways, 5 are used in various containers 😉

A.) 1 BeanManager per EAR

All the EAR is handled via 1 BeanManager. This is the way JBoss WildFly seems to handle things. First the BeanManager gets created and all it’s CDI Extensions get loaded (TODO research: all or only the ones from the shared libs?).

In reality it’s a bit more complicated. Weld uses a single BeanManager for each and every JAR in your EAR. Don’t ask me why, but that’s what I’ve seen. Still you only get one set of Extensions for your whole EAR. Keep this in mind.

The TCCL always seems to stay the shared EAR ApplicationClassLoader during boot time. Even while processing the classes of the WAR files. Thus the TCCL during @Observes ProcessAnnotatedType will be the shared EAR ApplicationClassLoader. But if you access your Extensions (or the information collected by them) later at runtime you might end up with a different TCCL. This is especially true for all servlet requests in your WARs. This makes it really hard to store anything with the usual Map<ClassLoader, Set<Info>> trick.

B.) 1 BeanManager per WAR + 1 ‘shared’ BeanManager

Each WAR fully boots up his own BeanManager. The shared EAR libs also get an own BeanManager to serve non-servlet requests like JMS and remote EJB invocations. Classes in shared EAR-libs simply get discovered over again for each WAR. Each WAR also has it’s own set of Extensions (they are usually 1:1 to the BeanManager). This is what I’ve seen in Apache Geronimo, IBM WebSphere and early Apache TomEE containers.

This is a bit tricky when it comes to handling of the @ApplicationScoped context (see CDI-129 for more information) or if you handle EJBs (which are by nature shared across the EAR). WebSphere e.g. seems to solve this by having an own Bean<SomeSharedEarLibClass> instance for each BeanManager (and thus for each WAR) but they share a single ApplicationContext storage and those beans are equals(). Probably they just compare the PassivationCapable#getPassivationId()?

It usually works fairly nice and it allows the usage of most common programming patterns. This also allows to modify classes from shared libs via Extensions you register in a single WAR. The downside is that you have to scan and process all the shared classes over and over again for each WAR. This obviously slows down the application boot time.

Oh this mode strictly seen also conflicts with the requirements for modularity of section 5 of the CDI specification. Overall I’m not very happy with section 5 but it should be mentioned.

C.) Hierarchic BeanManagers

In this case we have an 1:1 relation between a ClassLoader and the BeanManager. The shared EAR libs will get scanned by an EAR-BeanManager. Then each of the WARs get scanned via their own WebAppBeanManager. In contrast to scenario B. these WebAppBaenManagers will only scan the classes of the local WARs WEB-INF/lib and WEB-INF/classes and not the shared ear lib over again. If you need information from shared EAR lib classes then the WebAppBeanManger simply delegates to it’s ‘parent’ EAR-BeanManager. Apache TomEE uses this mode since 1.7.x

There are a few tricks the container vendor need to do in this case. E.g. propagating events and Bean lookups to the parent BeanManager, etc. Or to suppress sending CDI Extension Events to the parent (we recently learned this the hard way – is fixed in TomEE-1.7.2 which is to be released soon).

It also has an important impact on CDI Extension programmers: As your Extensions are 1:1 to the BeanManager we now also have the ClassPath split up into different Extension instances. This works fine for ProcessAnnotatedType but can be tricky in some edge cases. E.g. the DeltaSpike MessageBundleException did collect info about a certain ProducerBean and stored in in the Extension for later usage in @Observes AfterBeanDiscovery. Too bad that in my case this certain ProducerMethod was in a shared ear lib and thus gets picked up by the Extension-instance of the EAR-BeanManager but the ‘consumer’ (the interface annotated with @MessageBundle is in some WARs. And the WebAppBeanManager of this WAR obviously is not the one scanning the ProducerMethod of the class in the shared ear lib. Thus the Extension created a NullPointerException. This will be fixed in the upcoming Apache DeltaSpike-1.2.2 release.

The Impact on poor CDI Extension programmers

TODO: this is a living document. I’ll add more info and put it up review.

Explaining Java Inner Classes

The Problem

Today I had a colleague asking me to help her find a NotSerializableException. The code was like the following:

@ApplicationScoped
public class MyBusinessWindowHelper {
  ...
  public Button.ClickListener createExcelExportClickListener(Table t, String fileName) {
    return new Button.ClickListener() {
      @Override
      public void buttonClick(Button.ClickEvent clickEvent) {
        some action...
      }  
    }
  } 
}

The Button.ClickListener and Table are Serializable. The inner class also doesn’t store anything else which is not Serializable, so where is the problem?

People who found the answer in under 1 minute are probably under the top 1% Java programmers. If you don’t know the answer yet (not a shame, this is pretty hardcore stuff!) then read on.

The difference between a static and a non-static inner class

Static inner classes

Let’s first discuss static inner classes. They are quite the same like any other class which is lying around on your disk. In fact it makes almost no difference if you have the class located inside another class or as top level class. In terms of the generated bytecode there is really no difference.

Non-static inner classes

Non-static inner classes have access to the members of the outer class.

public class OuterClass
{
    private int meaningOfLife = 42;
    
    public class InnerClass {
        public int getMeaningOfLife() {
            return meaningOfLife;
        }
    }
}

As you can see the InnerClass can return the variable of the OuterClass. But how does that work? How does the inner class know about the outer class?

The answer is stunning but simple: You DON’T get what you see!

Let’s look at the bytecode which gets generated:

$> javac OuterClass.java
$> javap -c OuterClass\$InnerClass.class

will give us the following output:

Compiled from "OuterClass.java"
public class OuterClass$InnerClass {
  final OuterClass this$0;

  public OuterClass$InnerClass(OuterClass);
    Code:
       0: aload_0
       1: aload_1
       2: putfield      #1                  // Field this$0:LOuterClass;
       5: aload_0
       6: invokespecial #2                  // Method java/lang/Object."":()V
       9: return

  public int getMeaningOfLife();
    Code:
       0: aload_0
       1: getfield      #1                  // Field this$0:LOuterClass;
       4: invokestatic  #3                  // Method OuterClass.access$000:(LOuterClass;)I
       7: ireturn
}

There are 2 things which pop out:

final OuterClass this$0;

and

public OuterClass$InnerClass(OuterClass);

But hey, where does this come from? We did not have any constructor with an OuterClass parameter!

The answer is stunning but simple: Java did that for us!

Java automatically generates the this pointer of the outer class into any non-static inner class. And each constructor will also get an additional this parameter which is used to initialise this final field. Of course now it is possible for Java to use the generated outer class field (this$0) to access members of the outer class.

And what about anonymous classes?

Well, anonymous classes are just inner-classes without any specified name. Of course they will also get the this pointer as parameter.

And this is EXACTLY the reason why my colleague did get the NotSerializableException: Because storing the ClickListener in a View and serialising it away to another cluster also tried to serialise away the outer class (MyBusinessWindowHelper).

Btw, this is not only an issue if you play catch and hide with NotSerializableExceptions but also if you are hunting down mem-leaks!

A few more thoughts

That also explains why non-static inner classes cannot get proxied (because they do not have a default constructor). Plus it explains why they cannot be used as CDI or EJB beans.

All that is pretty obvious if one knows how the system works internally, isn’t?

The tricky CDI ServiceProvider pattern

CDI is really cool, and works great. But it’s not the hammer for all your problems!

I’ve seen the following a lot of times and I confess that I also used it in the first place: The CDI ServiceProvider pattern. But first let’s clarify what it is:

The ServiceProvider pattern

Consider you have a bunch of plugable modules and all of them might add an implementation for a certain Interface:

// this is just a marker interface for all my plugins
public interface MyPlugin{} 

The idea is that extensions can later easily implement this marker interface and the system will automatically pick up all the implementations accordingly.

public class PluginA implements MyPlugin {
  ...
}

public class PluginB implements MyPlugin {
  ...
}

The known way

I’ve seen lots of blogs recommending the following solution:

public static  List getContextualReferences(Class type, Annotation... qualifiers)
{
    BeanManager beanManager = getBeanManager();

    Set<Bean> beans = beanManager.getBeans(type, qualifiers);

    List result = new ArrayList(beans.size());

    for (Bean bean : beans)
    {
        Bean<?> bean = beanManager.resolve(beans);
        CreationalContext<?> creationalContext = beanManager.createCreationalContext(bean);
        result.add(beanManager.getReference(bean, type, creationalContext););
    }
    return result;
}

The original implementation can be found in the Apache DeltaSpike BeanProvider (and in Apache MyFaces CODI before that).

This worked really well for some time. But suddenly a few surprises happened.

What’s the problem with getBeans()?

One day we got a bug report in Apache OpenWebBeans that OWB is broken because the pattern shown above doesn’t work: JIRA issue OWB-658

What we found out is that the whole pattern doesn’t work anymore once one single plugin is defined as @Alternative

I digged into the issue and became aware that this is not an OWB issue but BeanManager#getBeans() is just not made for this usage! So let’s have a look about what the CDI spec says:

11.3.4:
The method BeanManager.getBeans() returns the set of beans which have the given required type and qualifiers and are available for injection

The important part here is the difference between “give me all beans of type X” and “give me all beans which can be used for an InjectionPoint of type X”. Those 2 are fundamentally different, because in the later case all other beans are no candidates for injection anymore if a single @Alternative annotated bean gets spotted. OWB did it just right!

Possible Solution 1

Don’t use @Alternative on any of your plugin classes 😉

That sounds a bit ridiculous, but at least it works.

Possible Solution 2

You could create a Message which collects all your plugins in a CDI-Extension-like way.
First we need a data object to store all the collected plugins.

public class PluginDetection {
  private List plugins = new ArrayList();

  public void addPlugin(MyPlugin plugin) {
    plugins.add(plugin);
  }

  public List getPlugins() {
    return plugins;
  }
}

Then we gonna fire this around:

private @Inject Event pdEvent;

public List detectPlugins() {
  PluginDetection pd = new PluginDetection();
  pdEvent.fire(pd);
  return pd.getPlugins();
}

The plugins just need to Observe this event and register themself:

public void register(@Observes PluginDetection pd) {
  pd.add(this);
}

Since the default for @Observes is Reception.ALWAYS, the contextual instance will automatically get created if it doesn’t yet exist.

But there is also a small issue with this approach: There is no way to disable/override a plugin with @Alternative anymore, since the messaging doesn’t do any bean resolving at all.

Using JPA in real projects (part 1)

Is JPA only good for samples?

This question is of course a bit provocative. But if you look at all the JPA samples out in the wild, then none of them can be applied to real world projecs without fundamental changes.

This post tries to cover a few JPA aspects as well as showing off some maven-foo from a big real world project. I am personally using Apache OpenJPA because it works well and I’m a committer on the project (which means I can immediately fix bugs if I hit one). I will try to motivate my friends from JBoss to provide a parallel guide for Hibernate and maybe we even find some Glassfish/EclipseLink geek.

One of the most fundamental differences between the different JPA providers is where they store the state information for the loaded entities. OpenJPA stores this info directly in the entities (EclipseLink as well afaik) and Hibernate stores it in the EntityManager and sometimes in the 1:n proxies used for lazy loading (if no weaving is used). All this is not defined in the spec but product specific behaviour. Please always keep this in mind when applying JPA techiques to another JPA provider.

I’ll have to split this article in 2 parts, otherwise it would be too much for a good read. Todays part will focus on the general project setup, the 2nd one will cover some coding practices usable for JPA based projects.

The Project Infrastructure and Setup

A general note on my project structure: my project is not a sample but fairly big (40k users, 5mio page hits, 600++ JSF pages) and consists of 10++ WebApps with each of them having their own backend (JPA + db + businesslogic), frontend (JSF and backing beans) and api (remote APIs) JARs. Thus I have all my shared configuration in myprj/parent/fe myprf/parent/be and myprj/parent/api maven modules, containing the pom.xml pointed to as <parent> by all backends, frontends resp apis.

├── parent
│   ├── api
│   ├── be (<- here I keep all my shared backend configuration) 
│   ├── fe
├── webapp1
│   ├── api
│   ├── be (referencing ../../parent/be/pom.xml)
│   └── fe
├── webapp2
│   ├── api
│   ├── be (referencing ../../parent/be/pom.xml)
│   └── fe
...

Backend Unit Test Setup

1. All my backend unit tests use testng and really do hit the database! A business process test which doesn’t touch the database is worth nothing imo…
We are using a local MySQL installation for the tests and use an Apache Maven Profile for switching to other databases like Oracle and PostgreSQL (which we both use in production).

2. We have a special testng test-group called createData which we can @Test(dependsOnGroups="createData"). Or we just use the @Test(dependsOnMethods="myTestMethodCreatingTheData").
That way we have all tests which create some pretty complex set of test-data running first. All tests which need this data as base for their own work will run afterwards.

3. Each test must be re-runnable and cleanup his own mess in @BeforeClass. We use BeforeClass because this also works if you kill your test in the debugger. Nice goodie: you also can check the produced data in the database later on. Too bad that there is no easy way to automatically proove this. The best bet is to make all your colleagues aware of it and tell them that they have to throw the next party if they introduce a broken or un-repeatable test 😉

The Enhancement Question

I’ve outlined the details and pitfalls of JPA enhancement in a previous post.
I’m a big fan of build-time-enhancement because it a.) works nicely with OpenJPA and b.) my testng unit tests run much faster (because I only enhance those entities once). I also like the fact that I know exactly what will run on the server and my unit tests will hit side effects early on. In a big project you’ll hit enhancement and state side effects which let your app act differently in unit test and on the EE server more often than you’ll guess.
Of course, this might differ if you use another JPA provider.

For enabling build-time-enhancement with OpenJPA I have the following in my parent-be.pom.

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.openjpa</groupId>
                <artifactId>openjpa-maven-plugin</artifactId>
                <version>${openjpa.version}</version>
                <configuration>
                    <includes>
                        ${jpa-includes}
                    </includes>
                    <excludes>
                        ${jpa-excludes}
                    </excludes>
                    <addDefaultConstructor>true</addDefaultConstructor>
                    <enforcePropertyRestrictions>true</enforcePropertyRestrictions>
                    <sqlAction>${openjpa.sql.action}</sqlAction>
                    <sqlFile>${project.build.directory}/database.sql</sqlFile>
                    <connectionDriverName>com.mchange.v2.c3p0.ComboPooledDataSource</connectionDriverName>
                    <connectionProperties>
                        driverClass=${database.driver.name},
                        jdbcUrl=${database.connection.url},
                        user=${database.user},
                        password=${database.password},
                        minPoolSize=5,
                        acquireRetryAttempts=3,
                        maxPoolSize=20
                    </connectionProperties>
                </configuration>
                <executions>
                    <execution>
                        <id>mappingtool</id>
                        <phase>process-classes</phase>
                        <goals>
                            <goal>enhance</goal>
                        </goals>
                    </execution>
                </executions>
                <dependencies>
                    <dependency>
                        <groupId>log4j</groupId>
                        <artifactId>log4j</artifactId>
                        <version>1.2.12</version>
                    </dependency>
                    <dependency>
                        <!-- 
                          otherwise you get ClassNotFoundExceptions during 
                          the code coverage report run
                        -->
                        <groupId>net.sourceforge.cobertura</groupId>
                        <artifactId>cobertura</artifactId>
                        <version>1.9.2</version>
                    </dependency>
                    <dependency>
                        <groupId>c3p0</groupId>
                        <artifactId>c3p0</artifactId>
                        <version>${c3p0.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>mysql</groupId>
                        <artifactId>mysql-connector-java</artifactId>
                        <version>${mysql-connector.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>com.oracle</groupId>
                        <artifactId>ojdbc14</artifactId>
                        <version>${ojdbc.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>postgresql</groupId>
                        <artifactId>postgresql</artifactId>
                        <version>${postrgresql-jdbc.version}</version>
                    </dependency>
                </dependencies>
            </plugin>

You might have spotted a few maven properties which I later define in each projects pom. That way I can keep my common configuration generic and still have a way to tweak the behaviour for each sub-project. Again a nice benefit: You can easily use mvn -Dsomeproperty=anothervalue to tweak those settings on the commandline.

  • ${jpa-includes} for defining the comma separated list of classes which should get enhanced, e.g. "mycomp/project/modulea/backend/*.class,mycomp/project/modulea/backend/otherstuff/*.class
  • ${jpa-exludes} the opposite to jpa-includes
  • openjpa.sql.action to define what should be done during DB schema creation. This can be build for always create the whole DB schema (CREATE TABLES), or refresh for generating only ALTER TABLE statements for the changes. I’ll come back to this later.
  • ${database.driver.name} and credentials properties are used to be able to run the schema creation against Oracle, MySQL and PostgreSQL (switched via maven profiles).

Creating the Database

For doing tests with a real database we of course need to create the schema first. We do NOT let JPA do any automatic database schema changes on JPA-startup. Doing so might unrecoverably trash your production database, so it’s always turned off!

Instead we trigger the SQL schema creation process via the Apache OpenJPA openjpa-maven-plugin manually (for the configuration see below):

$> mvn openjpa:sql

Then we check the generated SQL in target/database.sql and copy it to the structure we have in each of our backend projects:

webapp1/be/src/main/sql/
├── mysql
│   ├── createdb.sql
│   ├── createindex.sql
│   ├── database.sql
│   └── schema_delta.sql
├── oracle
│   ├── createdb.sql
│   ├── createindex.sql
│   ├── database.sql
│   └── schema_delta.sql
└── postgres
    ├── createdb.sql
    ├── createindex.sql
    ├── database.sql
    └── schema_delta.sql

The following files are involved in the db setup:

createdb.sql

This file creates the database itself. It is optional as not every database supports to create a whole database. In MySQL we just do the following

DROP DATABASE if exists ProjextXDatabase
CREATE DATABASE ProjextXDatabase CHARACTER SET utf8;
USE ProjextXDatabase;

In Oracle this is not that easy. It’s a major pain to drop and then setup a whole data store. A major problem is that you cannot easily access a datastore which doesnt exist anymore via Oracles JDBC driver. Instead, we just drop all the tables.:

DROP TABLE MyTable CASCADE constraints PURGE;
DROP TABLE AnotherTable CASCADE constraints PURGE;
...

If you have a better idea, then please speak up 😉

database.sql

This is the exact 1:1 DDL/Schema file we generated via the JPA (in my case via the openjpa-maven-plugins mvn openjpa:sql mentioned above). It is simply copied over from target/database.sql but the content remains unchanged. It runs after the createdb.sql file.

createindex.sql

This file contains the initial index tweaks which were not generated in the DDL. In Oracle and PostgreSQL this file e.g. contains all the indices on foreign keys, because OpenJPA doesn’t generate them (I remember that Hibernate does, correct?). In MySQL we don’t need those because MySQL automatically adds indices for foreign keys itself.

But this is of course a good place to add all the performance tuning stuff you ever wanted 😉

schema_delta.sql

This one is really a goldie! Once a project goes into production we do not generate full databae schemas anymore! Instead we switch the openjpa-maven-plugin to the refresh mode. In this mode OpenJPA will compare the entities with the state of the configured database and only generate ALTER TABLE and similar statements for the changes in target/database.sql. This works surprisingly good!

We then review the generated schema changes and append the content to src/main/sql/[dbvendor]/schema_delta.sql. Of course we also add clean comments about the product revision in which the change got made. That way an administrator just picks the n last entries from this file and is easily able to bring the production database to the last revision.

Doing this step manually is very important! From time to time there are changes (renaming a column for example) which cannot be handled by the generated DDL. Such changes or small migration updates need to be maintained manually.

How to create the DB for my tests?

This one is pretty easy if you know the trick: We just make use of the sql-maven-plugin.

Here is the configuration I use in my project:

    <profiles>
        <!-- Default profile for surefire with MySQL: creates database, imports testdata and runs all unit tests -->
        <profile>
            <id>default</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <build>
                <plugins>
                    <plugin>
                        <groupId>org.codehaus.mojo</groupId>
                        <artifactId>sql-maven-plugin</artifactId>
                        <configuration>
                            <driver>com.mysql.jdbc.Driver</driver>
                            <url>jdbc:mysql://localhost/</url>
                            <username>root</username>
                            <password/>
                            <escapeProcessing>false</escapeProcessing>

                            <srcFiles>
                                <srcFile>src/main/sql/mysql/createdb.sql</srcFile>
                                <srcFile>src/main/sql/mysql/database.sql</srcFile>
                                <srcFile>src/main/sql/mysql/schema_delta.sql</srcFile>
                                <srcFile>src/main/sql/mysql/createindex.sql</srcFile>
                                <srcFile>src/test/sql/mysql/testdata.sql</srcFile>
                            </srcFiles>
                        </configuration>

                        <executions>
                            <execution>
                                <id>setup-test-database</id>
                                <phase>process-test-resources</phase>
                                <goals>
                                    <goal>execute</goal>
                                </goals>
                            </execution>
                        </executions>

                        <dependencies>
                            <dependency>
                                <groupId>mysql</groupId>
                                <artifactId>mysql-connector-java</artifactId>
                                <version>${mysql-connector.version}</version>
                                <scope>runtime</scope>
                            </dependency>
                        </dependencies>
                    </plugin>
                </plugins>
            </build>
        </profile>

        <profile>
            <!-- that skips sql plugin and test!!! -->
            <id>skipSql</id>
        </profile>
        ...
        add profile for oracle and postgresql accordingly

Whenever you run your build, the database will be freshly set up in the process-test-resources phase. The database will then be exactly as in production!

Guess we are now basically ready to start hacking on our project!

The 2nd part will focus on how to handle JPA stuff in the application code. Stay tuned!

LieGrue,
strub

Is there a way to fix the JPA EntityManager?

Using JPA is easy for small projects but has well hidden problems which are caused by some very basic design decisions. Quite a few of them are caused because the EntityManager cannot be made Serializable. Although there are some JPA providers which claim serializability (Hibernate) they aren’t!

Is the EntityManager Serializable?

The LazyInitializationException is a pretty bad beast if you ever worked with EJB managed EntityManagers. That problem caused lots of people to discover alternative ways. Two of the most prominent are JBoss Seam2 if you are working with the JBoss stack and Apache MyFaces Orchestra for Spring applications.

The basic problems are summed up very well in the at large still correct Apache MyFaces Orchestra documentation:
Apache MyFaces Orchestra Persistence explanation

If you read through the whole page, you will see the TODOs at the very bottom of the page:

TODO: is the persistence-context serializable? Are all persistent objects in the context always serializable?

The simple answer is: NO not at all! Neither the EntityManager nor the state in the entities are Serializable as per the current JPA specification!

Why is the EntityManager not Serializable

There are a few reasons:

1. Pessimistic Locking

The biggest blocker first: JPA doesn’t only support Optimistic Locking but also Pessimistic Locking. You can either declare this in your persistence.xml and also programmatically via the LockModeType in many functions.

EntityManager#find(java.lang.Class entityClass, java.lang.Object primaryKey, LockModeType lockMode) 
EntityManager#lock(java.lang.Object entity, LockModeType lockMode) 
...

But if you ever use pessimistic locking (a real hard lock on the database) the connection is bound to the database and cannot be ‘transferred’ to another EntityManager without losing the lock.

2. Id and Version fields are optional

To use the optimistic locking approach, a primary key plus some ‘version’ field must be used in the entity:

 UPDATE tableX SET([somevalues], version=:oldversion+1) WHERE id=:myId AND version==:oldversion

Obviously this update can only succeed once. Trying to update the row a second time will not find any database entry because the version==:oldversion will not be true anymore.

When you use optimistic locking in JPA, you will always have such a ‘version’ column already. But there is no need to specify it yet! Thus this information will not be transported if you serialize the entity!

To fully support optimistic locking, those entities will need mandatory @Id and @Version columns.

3. Losing the entity state information

As outlined in a previous blog post every JPA entity will get ‘enhanced’ with some magic code which tracks _loaded and _dirty state information. Those BitFlags will track the parts of the entity which got changed or fetched lazily.

The problem in this area is mostly caused by the JPA spec which by default prevents the JPA providers from serializing the ‘enhanced entities’ but requires serializing the ‘native’ information. At least that seems to be the common understanding of the following paragraph in the JPA spec:

„Serializing entities and merging those entities back into a persistence context may not be interoperable across vendors when lazy properties or fields and/or relationships are used.
A vendor is required to support the serialization and subsequent deserialization and merging of detached entity instances (which may contain lazy properties or fields and/or relationships that have not been fetched) back into a separate JVM instance of that vendor’s runtime, where both runtime instances have access to the entity classes and any required vendor persistence implementation classes.

Of course, most JPA providers know a way to enable the serialization of the state fields. In OpenJPA just provide the following magic properties to your persistence.xml:

<property name="openjpa.DetachState" value="loaded(DetachedStateField=true)"/>
<property name="openjpa.Compatibility" value="IgnoreDetachedStateFieldForProxySerialization=true"/>

This will also serialize _loaded and _state BitFlags along with your Entity.

The problem with having the EntityManager not Serializable

Well, this one is easy:

  • You cannot store the EnityManager in a Conversation
  • You cannot store the EntityManager in a Session
  • You cannot store the EntityManager in a JSF View
  • No clustering, because Clustering means that you need to Serialize the state

What can you do today?

Today the only working sulution is the entitymanager-per-request pattern. Basically creating a @RequestScoped EntityManager e.g. via a CDI @Produces for each and every request. That also means that you need to manually merge those entities on the callback. If you use JSF that is in your action.

 

How to fix the JPA EntityManager in the future?

Here are my thoughts about how we can do better in the future. Please note that there is a project called Avaje eBean which is not JPA compliant but has already successfully implemented those ideas.

Provide an OptimisticEntityManager

public interface OptimisticEntityManager extends EntityManager, Serializable

The most important change here is that it implements the java.io.Serializable interface.
This OptimisticEntityManager should throw an NonOptimisticModeException whenever one tries to execute an operation on the EntityManager which requires a non-optimistic LockModeType or another operation which creates some lock or non-serializable behaviour.

There should be a way to explicitly request an OptimisticEntityManager, e.g. via

OptimisticEntityManager EntityManagerFactoy#createOptimisticEntityManager(); 

Make @Id and @Version mandatory for those Entities

This will solve the problem with losing the optimistic lock information when serializing.

Define _loaded and _dirty Serialization

The future JPA spec could either clarify that serialization is more important than JPA-vendor inter-compatibility (who uses 2 different JPA providers in the same environment anyway?).
Or just specify that 2 BitFlags can be passed in the Serialized entity and how they should behave.

Please tell me what you think? Do we miss something? It’s not an easy move, but up to now I think it is doable!

PS: Thanks to Shane Bryzak and Jason Porter for helping me get rid of the worst English grammar and wording issues at least. Hope you folks got the gist regardless of my bad english 😉

Unit Testing Strategies for CDI based projects

Even after 2 years of CDI going public there are still frequent questions about how to unit test CDI based applications. Well, here it goes.

1. Arquillian, the heavyweight champion

Arquillian is the core part of the EE testing effort of JBoss. The idea is to write a unit test exactly once and run it on the target platform via some container integration connectors.

How does Arquillian tests look like?

I will not give a detailed introduction here, but I’d like to mention that an Arquillian test will usually contain a @Deployment section which packages all the classes under test into an own temporary Java packaging, like a JAR, EAR, WAR, etc and uses this temporary artifact to run the tests.

There are basically 2 ways to run Arquillian tests. I’ll not go into details but only give a very rough (and probably not too exact) overview:

Remote/Managed style Arquillian tests

In this case the Arquillian tests will be started in an extenal Java VM, e.g. a started JBossAS7 or GlassFish-3.2. If you start your unit test locally, it will get ‘deployed’ to the other VM and executed over there. You can debug through as if you were working over there which is established via the java debugging and profiling API. Some container connectors also allow to locally run ‘as-client’.

The Good: this is running exactly on the target platform you do your real stuff on.
The Bad: Slower than native unit tests. No full control over the lifecycle, etc. You e.g. cannot simulate multiple servlet requests in one unit test.

Embedded style Arquillian tests

Instead of deploying your Arquillian test to another VM you are starting a standalone CDI or embedded EE container in the current Java VM.

The Good: Much faster as going remote. And you don’t need any server running.
The Bad: Sometimes this leads to ClassLoader isolation problems between your @Deployment and your unit test. This leads to problems when you e.g. like to unit test classes which starts an EntityManager. The reason is that e.g. any persistence.xml will get picked up twice. Once from your regular classpath and the second time from the @Deployment.

You can read more about it in the JBoss Arquillian documentation (thanks to Aslak Knutsen for the links):

2. DeltaSpike CdiControl, the featherweight champion

The method to unit test CDI based programs I like to introduce next uses the Apache DeltaSpike CdiControl API introduced in my previous post. It is based on the ideas and experience I had with the Apache OpenWebBeans CdiTest module I created and heavily use since 3 years.

First of all, you will need a META-INF/beans.xml in your test class path. All your unit tests will at the end become CDI enabled classes! If you use maven, then just create an empty file.

$> touch src/test/resources/META-INF/beans.xml

Instead of creating a @Deployment like Arquillian does, we now just take the whole test-dependencies and scan them. Of course this means that starting the CDI container for a unit test will always have to scan all the classes which are enabled in JARs which have a META-INF/beans.xml marker file. But on the other hand we spare creating the @Deployment and always test the ‘real thing’.

Introducing the test base class

The first step is to create a base-class for all your unit tests. I will break down the full class into distinct blocks and explain what happens. I’m using testng in this case, but jUnit tests will look pretty similar.

The main thing we need is the DeltaSpike CdiContainer. CDI containers don’t like it to run concurrently within the same ClassLoader. Thus we have our CdiContainer as static member variable and share it across parallel tests.

public abstract class CdiContainerTest {
    protected static CdiContainer cdiContainer;

Now we also need to initialize and use it for each method under test. This is also a good place to set the ProjectStage the test should execute in.
After boot()-ing the CdiContainer we also start all the contexts.
If the container is already started, we just clean the contextual instances for the current thread.

    @BeforeMethod
    public final void setUp() throws Exception {
        containerRefCount++;

        if (cdiContainer == null) {
            ProjectStageProducer.setProjectStage(ProjectStage.UnitTest);

            cdiContainer = CdiContainerLoader.getCdiContainer();
            cdiContainer.boot();
            cdiContainer.getContextControl().startContexts();
        }
        else {
            // clean the Instances by restarting the contexts
            cdiContainer.getContextControl().stopContexts();
            cdiContainer.getContextControl().startContexts();
        }
    }

We also do proper cleanup after each method finishes. We especially cleanup all RequestScoped beans to ensure that e.g. disposal methods for @RequestScoped EntityManagers gets invoked.

    
    @AfterMethod
    public final void tearDown() throws Exception {
        if (cdiContainer != null) {
            cdiContainer.getContextControl().stopContext(RequestScoped.class);
            cdiContainer.getContextControl().startContext(RequestScoped.class);
            containerRefCount--;
        }
    }

Another little trick: in the @BeforeClass we ensure that the container is booted and then do some CDI magic. We get the InjectionTarget and fill all InjectionPoints of our very unit test subclass via the inject() method.
This allows us to use @Inject in our unit test classes which extend our CdiContainerTest.

    @BeforeClass
    public final void beforeClass() throws Exception {
        setUp();
        cdiContainer.getContextControl().stopContext(RequestScoped.class);
        cdiContainer.getContextControl().startContext(RequestScoped.class);

        // perform injection into the very own test class
        BeanManager beanManager = cdiContainer.getBeanManager();

        CreationalContext creationalContext = beanManager.createCreationalContext(null);

        AnnotatedType annotatedType = beanManager.createAnnotatedType(this.getClass());
        InjectionTarget injectionTarget = beanManager.createInjectionTarget(annotatedType);
        injectionTarget.inject(this, creationalContext);
    }

After the suite is finished we need to properly shutdown() the container.

    @AfterSuite
    public synchronized void shutdownContainer() throws Exception {
        if (cdiContainer != null) {
            cdiContainer.shutdown();
            cdiContainer = null;
        }
    }

}

You can see the full class in my lightweightEE sample (work in progress):
CdiContainerTest.java

The usage

Just write your unit test class and extend the CdiContainerTest class.

public class CustomerServiceTest extends CdiContainerTest
{

You can even use injection in your unit tests, because they get resolved in the @BeforeClass method of the base class.

    private @Inject CustomerService custSvc;
    private @Inject UserService usrSvc;
...

And just use the injected resources in your test

    @Test(groups = "createData")
    public void testCustomerCreation() throws Exception {
        Customer cust1 = new Customer();
        ...
        custSvc.createCustomer(cust1);
    }

Btw, to use the latest Apache DeltaSpike snapshots you might need to configure the following maven repository in your project or ~/.m2/settings.xml :

  <repositories>
    <repository>
      <id>people.apache.snapshots</id>
      <url>https://repository.apache.org/content/repositories/snapshots/</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
  </repositories>

Limits

The CdiContainer based unit tests work really fine if your project under test does not rely on EE container resources, like JNDI or JMS. It might work for e.g. EJB if you use Apache OpenWebBeans plus Apache OpenEJB or JBoss Weld plus JBoss microAS.
I would need to test this out to give more detailed infos.
As a general rule of thumbs: once you need heavyweight EE resources in your backend, you should go the Arquillian route.

If you only use JNDI for configuring a JPA DataSource then please look at the documentation of Apache MyFaces CODI ConfigurableDataSource. This will give you much more flexibility and removes the need of the almost-ever-broken JNDI configuration in your persistence.xml. Instead you can use plain CDI mechanics to configure your database settings.

Summary

If you like to do more end-to-end or integration testing, then Arquillian is worth looking at. If you just need a quick unit test for your CDI based project, then take a look at my CdiContainerTest base class and grab the ideas you like. And don’t forget to provide feedback 😉

Why is OpenWebBeans so fast?

The Speed-King

Apache OpenWebBeans [1] is considered the Speed-King of dependency injection containers. A blog post of Gehard Petracek showed this pretty nicely [2][3]. But why is this? During performance tests for the big EE application I developed with colleagues since late 2009 (40k users, 5 mio pages/day) we came around a few critical points.

1. Static vs Dynamic

The CDI specification requires to scan all the classes which are in locations which have a META-INF/beans.xml marker file and store them in a big list of Bean instances. This information will only get scanned at startup, checked for inconsistency and further optimized. At runtime all the important information is immediately available without any further processing.

This is afaik different in Spring which allows some kind of ‘dynamic’ reconfiguration. Afaik you can switch some parts of the bean configuration programmatically at runtime. This might be neat for some usecases (e.g. scripting integration) but definitely requires the container to do much more work at runtime. It also prevents to apply some kind of very aggressive caching.

2. Reduce String Operations

In dependency injection containers you often store evaluation results in Maps. We pretty often have constellations similar to the following example:

java.lang.reflect.Method met = getInjectionMethod();
Class clazz = getInjectionClass();
String key = clazz.getName() + "/" + met.getName() + parametersString(met.getParameterTypes());
Object o = map.get(key);

This obviously is nothing I would ever write into a piece of performance intense code! The first step is to use a StringBuilder instead

Using StringBuilder

StringBuilder sb = new StringBuilder();
sb.append(clazz.getName()).append('/').append(met.getName()).append('/');
appendParameterTypes(sb, met.getParameterTypes());
String key = sb.toString();

This is now twice as fast. But still way too slow for us 😉

Increase StringBuilder capacity

If I remember the sources correctly StringBuider by default has a internal buffer capacity of 16 bytes and gets doubled every time this buffer exceeds. In our example above this most times leads to expanding the internal capacity 3 or 4 times (depending on the lenght of the parameters, etc). Every time the capacity gets extended the internal String handling must perform a alloc + memcpy(newbuffer, oldbuffer) + free(oldbuffer). We can reduce this by initially allocating more space already:

StringBuilder sb = new StringBuilder(200);
sb.append(clazz.getName()).append('/').append(met.getName()).append('/');
...

This already helps a lot, the code now runs 3x as fast as with the default StringBuilder(). The downside is that you always need 200 bytes for each key at least. And we are still far from what’s possible performance wise.

Hash based solution

Another solution looks like the following:

long key = clazz.getName().hashCode() + 29L * met.getName().hashCode()
for(Type t:met.getParameterTypes()) {
  key += 29L * t.hashCode();
}
Object o = map.get(key);

This performs much faster than any String based solution. Up to 50 times for 3 String operands to be more precise. Even more if more Strings need to be concatenated. Not only is the key creation much faster because it needs zero memory allocation. It also makes the map access faster as well. The downside is that the hash keys might clash.

Own Key Object

The solution we use in OWB now is to create an own key object which implements hashCode() and equals() in an optimized way.

3. ELResolver tuning

Expression Language based programming is excessively used in JSP and JSF pages and also in other frameworks. It is fairly easy to use, plugable and integrates very well in existing frameworks. But many people are not aware that the EL integration is pretty expensive. It works by going down a chain of registered ELResolver implementations until one of them found the requested object.

Nested EL calls

We will explain the impact of the ELResolver by going through a single EL invocation. Imagine the following line in a JSF page:

<inputText value="#{shoppingCart.user.address.city}">

How many invocations to an ELResolver#getValue() do you think gets executed?

The answer is: many! Let’s say we have 10 ELResolvers in the chain. This is not a fantasy value but comes pretty close to the reality. While evaluating the whole expression, the EL integration splits the given EL-expression parts on the dots (‘.’) and starts the resolving with the outerleft term (“shoppingCart”). This EL-part will be passed down the ELResolver chain until one ELResolver knows the given name. Since the cheap EL-Resolvers which have a high hit-ratio are usually put first our CDI-ELResolver will be somewhere in the middle. This means that we already got approximately 5 other ELResolver invocations before we find the bean…

The Answer: our single EL expressin will roughly perform 30 ELResolver invocations already!

But what can we do to improve the performance of our very own WebBeansELResolver at least?

EL caching

The OpenWebBeans WebBeansELResolver (or any CDI containers ELResolver) will typically execute the following code to get a bean:

Set<Bean> beans = beanManager.getBeans(name);
CreationalContext creationalContext = beanManager.createCreationalContext(bean);
Bean bean = beanManager.resolve(beans);
Object contextualReference = beanManager.getReference(bean, Object.class, creationalContext);

This gives us quite a few things we can cache.

a.) we can cache the found Bean.

b.) for NormalScoped beans (most scopes, except @Dependent) we can even cache the contextualReference because in this case we always will get a Proxy anyway (see the spec: ‘Contextual Reference’).

Negative caching

Searching a bean is expensive. Searching it and NOT finding anything is even more expensive!

To prevent our ELResolver from doing this over and over again, we also cache the misses. This did speed up the OpenWebBeans EL integration big times.

4. Proxy tuning

One of the most elaborated parts in OWB is the possibility to configure custom proxies for any scope.

The job of a Contextual Reference

Proxies in CDI are used for implementing Interceptors, Serialization and mainly for on-the-fly resolving the underlying ‘Contextual Instance’ of a ‘Contextual Reference’. You can read more about this in the latest JavaTechJournal [4] ‘CDI Introduction’ article. In this last part the proxy will lookup the correct Contextual Instance for each method invocation and then redirect the method invocation to this resolved instance. In OpenWebBeans this is done in the NormalScopedBeanInterceptorHandler.

But is it really necessary to always perform this expensive lookup?

Caching resolved Contextual Instances

Let’s take a look at a @RequestScoped bean. Once the servlet request gets started the resolved contextual instance doesn’t change anymore until the end of the request. So it should be possible to ‘cache’ this contextual reference and clean the cache when the servlet request ends. OpenWebBeans does this by providing an own Proxy-MethodHandler for @RequestScoped beans, the RequestScopedBeanInterceptorHandler, which stores this info in a ThreadLocal:

private static ThreadLocal<HashMap<OwbBean, CacheEntry>> cachedInstances 
    = new ThreadLocal<HashMap<OwbBean, CacheEntry>>();

An @ApplicationScoped bean in a WebApp doesn’t change the instance at all once we resolved it. Thus we use an ApplicationScopedBeanInterceptorHandler [5] to even cache more aggressively.

Cache your own Scopes

As Gerhards blog post [2] shows we can also use an OWB feature to configure the builtin and also custom Proxy-MethodHandlers for any given Scope. The configuration is done via OpenWebBeans own configuration mechanism explained in a previous blog post.

Simply create a file META-INF/openwebbeans/openwebbeans.properties which contains a content similar to the following:

org.apache.webbeans.proxy.mapping.javax.enterprise.context.ApplicationScoped=org.apache.webbeans.intercept.ApplicationScopedBeanInterceptorHandler

The config looks like the following:

org.apache.webbeans.proxy.mapping.[fully qualified scope name]=[proxy method-handler classname]

Summary

Why the hell did we put so much time and tricks into Apache OpenWebBeans?

The answer is easy: Our application shows study codes of curricula on a single page. Up to 1600 lines in a <h:dataTable> resulting in > 450.000 EL invocations! We tried this with other containers which used to take 6 seconds to render this page.

With Apache tomcat + Apache OpenWebBeans + Apache MyFaces-2.1.x [6] + JUEL [7] + Apache OpenJPA [8] we are now down to 350ms for this very page …

have fun and LieGrue,
strub

PS: we already explained a few of our tricks to our friends from the Weld community. This resulted in their @RequestScoped beans getting 3 times faster and now being almost as fast as in OWB 😉

[1] Apache OpenWebBeans
[2] os890 blog 1
[3] os890 blog 2
[4] Java-Tech-Journal CDI special
[5] ApplicationScopedBeanInterceptorHandler.java
[6] Apache MyFaces
[7] JUEL EL-2.2 Implementation
[8] Apache OpenJPA

Control CDI Containers in SE and EE

The Problem

Have you ever tried to boot a CDI container like JBoss Weld or Apache OpenWebBeans in Java SE? While doing this task you will end up knee deep in the implementation code of the container you are using! Not only is this a complicated task to do, but your code will also end up being non-portably because you need to invoke container specific methods.

Where do I need this at all?

Booting a CDI container is not only useful if you like to hack Java SE apps, like a standalone SWT application. It is also very valuable if you can boot a CDI container for the unit tests of your business services and you don’t like to setup Arquillian for that task.

The Solution – Apache DeltaSpike CdiControl

The Apache DeltaSpike project [1][2] is a collection of CDI-Extensions which are created by a large community of CDI enthusiasts. It consists of most of the Apache MyFaces CODI and JBoss Seam3 community members, plus other high-profile experts in this area. The functionality of DeltaSpike is growing with each day!

One of the many features which DeltaSpike already provides in this yet early stage is a way to boot any CDI-Container and control it’s Context lifecycles via a few very simple interfaces [3]. Your code will end up being completely independent of the CDI implementation you use!

There are currently two implementations of this API, one for Apache OpenWebBeans, the other one for JBoss Weld. Both also successfully passed a small internal TCK test suite. The respective implementation simply gets activated by putting the correct impl-JAR into the classpath.

Maven Integration

For an Apache Maven project, you can just add the cdictrl-api and one of the impl JARs as dependencies in your pom.xml.

For using it with Apache OpenWebBeans:

<dependency>
    <groupId>org.apache.deltaspike.cdictrl</groupId>
    <artifactId>deltaspike-cdictrl-api</artifactId>
    <version>${deltaspike.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.deltaspike.cdictrl</groupId>
    <artifactId>deltaspike-cdictrl-owb</artifactId>
    <version>${deltaspike.version}</version>
</dependency>

For JBoss Weld it’s almost the same, just with the weld impl jar:

<dependency>
    <groupId>org.apache.deltaspike.cdictrl</groupId>
    <artifactId>deltaspike-cdictrl-api</artifactId>
    <version>${deltaspike.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.deltaspike.cdictrl</groupId>
    <artifactId>deltaspike-cdictrl-weld</artifactId>
    <version>${deltaspike.version}</version>
</dependency>

Note: Those features will get released with deltaspike-0.2-incubating. In the meantime the deltaspike.version is 0.2-incubating-SNAPSHOT and it is available in the Apache snapshots repository:
https://repository.apache.org/content/repositories/snapshots/org/apache/deltaspike/cdictrl/deltaspike-cdictrl-api/0.2-incubating-SNAPSHOT/

How to use the API

There are basically two parts

  1. The CdiContainer Interface will provide you with a way to boot and shutdown the CDI Container in JavaSE apps.
  2. The ContextControl interface is provides a way to control the lifecycle of the built-in Contexts of the CDI container.

CdiContainer usage

You can use the CdiContainerLoader as a simple factory to gain access to the underlying CdiContainer implementation. If you like to boot a CDI container for a unit test or in a Java SE application then just use the following fragment:

// this will give you a CdiContainer for Weld or OWB, depending on the jar you added
CdiContainer cdiContainer = CdiContainerLoader.getCdiContainer();

// now we gonna boot the CDI container. This will trigger the classpath scan, etc
cdiContainer.boot();

// and finally we like to start all built-in contexts
cdiContainer.getContextControl().startContexts();

// now we can use CDI in our SE application. 
// And there is not a single line of OWB or Weld specific code in your project!

// finally we gonna stop the container 
cdiContainer.shutdown();

Pretty much self explaining, isn’t? Of course, those 2 classes are of no interest for Java EE applications since the CDI Container already gets properly booted and shut down by the Servlet container integration.

ContextControl usage

The ContextControl interface allows you to start and stop built-in standard Contexts like @RequestScoped, @ConversationScoped, @SessionScoped, etc. It is provided as @Dependent bean and can get injected in the classic CDI way. This is not only usable in Java SE projects but also very helpful in Servlets and Java EE containers!

The following samples should give you an idea about the power of this tool:

Restarting the RequestContext in a unit test

I pretty frequently had the problem that I needed to test my classes with attached and also with detached JPA entities. In most of our big real world projects we are using the entitymanager-per-request approach [4] and thus have a producer method which creates a @RequestScoped EntityManager. Since a single unit test is usually treated as one ‘request’ I had problems detaching my entities. With the ContextControl this is no problem anymore as the following code fragment shows:

@Test
public void testMyBusinessLogic() {
  doSomeJpaStuff()
  MyEntity me = em.find(...);
  
  ContextControl ctxCtrl = BeanProvider.getContextualReference(ContextControl.class);

  // stoping the request context will dispose the @RequestScoped EntityManager
  ctxCtrl.stopContext(RequestScoped.class);

  // and now immediately restart the context again
  ctxCtrl.startContext(RequestScoped.class);

  // the entity 'em' is now in a detached state!
  doSomeStuffWithTheDetachedEntity(em);
}

Attaching a Request Context to a new thread in EE

Everyone who tried to access @RequestScoped CDI beans in a new Thread created in a Servlet or any other Java EE related environment has for sure experienced the same pain: accessing the @RequestScoped bean will result in a ContextNotActiveException. But how comes? Well, the Request Context usually gets started for a particular thread via a simple ServletRequestListener. The problem is obvious: no servlet-request means that there is no Servlet Context for the Thread! But this sucks if you like to reuse your nice business services in e.g. a Quartz Job. The ContextControl can help you in those situations as well:

public class CdiJob implements org.quartz.Job {
  public void execute(JobExecutionContext context) throws JobExecutionException {
    ContextControl ctxCtrl = 
      BeanProvider.getContextualReference(ContextControl.class);

    // this will implicitly bind a new RequestContext to your current thread
    ctxCtrl.startContext(RequestScoped.class);

    doYourWork();

    // at the end of the Job, we gonna stop the RequestContext
    // to ensure that all beans get properly cleaned up.
    ctxCtrl.stopContext(RequestScoped.class);
  }
}

And there are tons of other situations where this can be useful…

LieGrue,
strub

PS: Gonna change the motd to ‘All is under (cdi-) control!’
PPS: we will most probably introduce a similar API in the CDI-1.1 specification

[1] https://git-wip-us.apache.org/repos/asf?p=incubator-deltaspike.git
[2] https://issues.apache.org/jira/browse/DELTASPIKE
[3] https://git-wip-us.apache.org/repos/asf?p=incubator-deltaspike.git;a=tree;f=deltaspike/cdictrl/api/src/main/java/org/apache/deltaspike/cdise/api
[4] http://docs.redhat.com/docs/en-US/JBoss_Enterprise_Web_Server/1.0/html/Hibernate_Entity_Manager_Reference_Guide/transactions.html