The right Scope for JBatch Artifacts

In my recent JavaLand conference talk about JSR-352 JBatch and Apache BatchEE I briefly mentioned that JBatch Artifacts should have a scope of @Dependent (or Prototype scope if you are using Spring). Too bad there was not enough time to dig into the problem in depth so here comes the detailed explanation.

What is a JBatch Artifact

A JBatch batch needs a Job Specification Language XML file in META-INF/batch-jobs/*.xml files. These files describes how your batch job is built up.

Here is a small example of how such a batch JSL file could look like

<job id="mysamplebatch" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
  <step id="mysample-step">
    <listeners>
      <listener ref="batchUserListener" >
      <properties>
        <property name="batchUser" value="#{batchUser}"/>
      </properties>
      </listener>
    </listeners>
    <batchlet ref="myWorkerBatchlet">
      <properties>
        <property name="inputFile" value="#{inputFile}"/>
      </properties>
    </batchlet>
  </step>

In JSR-352 an Artifact are all pieces which are defined in your JBatch JSL file and get requested by the container. In the sample above this would be batchUserListener and myWorkerBatchlet.

The following types can be referenced as Batch Artifacts from within your JSL:

  • Batchlets
  • ItemReader
  • ItemProcessor
  • ItemWriter
  • JobListener
  • StepListener
  • CheckpointAlgorithm
  • Decider
  • PartitionMapper
  • PartitionReducer
  • PartitionAnalyzer
  • PartitionCollector

The Batch Artifact Lifecycle

The JBatch spec is actually pretty clear what lifecycle needs to get applied on Batch Artifacts:

11.1 Batch Artifact Lifecycle
All batch artifacts are instantiated prior to their use in the scope in which they are declared in the Job XML and are valid for the life of their containing scope. There are three scopes that pertain to artifact lifecycle: job, step, and step-partition.
One artifact per Job XML reference is instantiated. In the case of a partitioned step, one artifact per Job XML reference per partition is instantiated. This means job level artifacts are valid for the life of the job. Step level artifacts are valid for the life of the step. Step level artifacts in a partition are valid for the life of the partition.
No artifact instance may be shared across concurrent scopes. The same instance must be used in the applicable scope for a specific Job XML reference.

The problem is that whenever you use a JavaEE artifact then you might get only a proxy. Of course the reference to this proxy gets thrown away correctly but the instance behind the proxy might survive. Let’s look at how this works internally.

How Batch Artifacts get resolved

A JBatch implementation can provide it’s own mechanism to load the artifacts. This is needed as it is obviously different whether you use BatchEE with CDI or if you use Spring Batch (which also implements JSR-352).
In general there are 3 different ways you can reference a Batch Artifact in your JSL:

  1. Via a declaration in an optional META-INF/batch.xml file. See the section 10.7.1 of the specification for further information.
  2. Via it’s fully qualified class name.
    In BatchEE we first try to get the class via BeanManager#getBeans(Type) and BeanManager#getReference. If that doesn’t give us the desired Contextual Reference (CDI proxy) then we simply call ClassLoader#loadClass create the Batch Artifact with newInstance() and apply injection into this instance
  3. Via it’s EL name. More precisely we use BeanManager#getBeans(String elName) plus a call to BeanManager#getReference() as shown above.

We now know what a Batch Artifact is. Whenever you are on a JavaEE Server you will most likely end up with a CDI or EJB Bean which got resolved via the BeanManager. If you are using Spring-Batch then you will most times get a nicely filled Spring bean.

The right Scope for a Batch Artifact

I’ve seen the usage of @javax.ejb.Stateless on Batch Artifacts in many samples. I guess the people writing such samples never used JBatch in real production yet 😉 Why so? Well, let’s look at what would happen if we implement our StepListener as stateless EJB:

@javax.ejb.Stateless
@javax.inject.Named // this makes it available for EL lookup
public class BatchUserListener implements StepListener {
  @Inject 
  @BatchProperty
  private String batchUser;

  @Override
  public void beforeStep() throws Exception {
     setUserForThread(   
  }

  @Override
  public void afterStep() throws Exception {
    clearUserForThread();
  }
}

Now let’s assume that the BatchUserListener gets not only used in my sample batch but in 30 other batches of my company (this ‘sample’ is actually taken from a HUGE real world project where we use Apache BatchEE since over a year now).

What will happen if e.g. a ‘DocumentImport’ batch runs before my sample batch? The first batch who uses this StepListener will create the instance. At the time when the instance gets created by the container (and ONLY at that time) it will also perform all the injection. That means it will look up the ‘batchUser’ parameter and injects it into my @BatchProperty String. Let’s assume this DocumentImport batch uses a ‘documentImportUser’. So this is what we will get injected into the ‘batchUser’ variable;

Once the batch step is done the @Stateless instance might be put back into some pool cache. And if I’m rather unlucky then exactly this very instance will later get re-used for mysample-step. But since the listener already exists there will no injection be performed on that instance. What means that the steplistener STILL contains the ‘documentImportUser’ and not the ‘mySampleUser’ which I explicitly did set as parameter of my batch.

The very same issue also will happen for all injected Variables which do not use proxies, e.g.:

  • @Inject StepContext
  • @Inject JobContext

TL;DR: The Solution

Use @Dependent scoped beans for your Batch Artifacts and only use another scope if you really know what you are doing.

If you like to share code across different items of a Step or a Job then you can also use BatchEE’s @StepScoped and @JobScoped which is available through a portable BatchEE CDI module

Advertisements

About struberg
I'm an Apache Software Foundation member blogging about Java, µC, TheASF, OpenWebBeans, Maven, MyFaces, CODI, GIT, OpenJPA, TomEE, DeltaSpike, ...

2 Responses to The right Scope for JBatch Artifacts

  1. Michael says:

    If you intentionally have chosen @Stateless because your batch artifact e. g. a configurable Batchlet needs Transactions, them removing @Statless is not an option and @Stateless and @Dependent don’t seem to work together either.

    In this case the annotation @Stateful is your friend. It makes your Batchlet an EJB provided with transactions and it bounds the lifecycle of your Batchlet to its client, which is the Step in this case.
    For every Step a new instance of your Batchlet will be created and will be injected with the correct values for @BatchProperty.

    • struberg says:

      indeed, using @Stateful is of course an option as it is by default @Dependent. But you can even use a pure CDI bean @Dependent @Transactional and you’ll get the very same results.
      Both ways are surely better than using @Stateless!
      Even if you don’t inject any Batch artifacts (@BatchProperty, @Inject JobContext, @Inject StepContext, etc) _yet_ then I still recommend against using @Stateless. Simply because it’s too error prone if you change something months later.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: