toString(), equals() and hashCode() in JPA entities

Many users have generated toString, equals() and hashCode() methods in their JPA entities.
But most times they underestimate what impact that can have.

This blog post post is inspired by a chat I had with Gavin King and Vlad Mihalcea.

Preface: I like to emphase that I take a big focus on keeping the customer code portable across different JPA vendors. Some ‘Uber trick’ might work in one JPA vendor and totally mess up the others. Each JPA provider is broken in it’s own very special way. Trust me, I know what I am talking about from both a user and a vendor perspective… The stuff I show here is the least common denominator for JBoss Hibernate, EclipseLink and Apache OpenJPA. Please shout out if you think some of the shown code does not work on one of those JPA containers.

toString()

What’s wrong with most toString() methods in entities?
Well, most of the times developers just use the ‘generated toString’ shortcut to create this method. And that means that the generated toString() method usually just reads all the attributes of your entity and prints it.

What happens if you touch an attribute really depends in a high degree which ‘mode’ your JPA provider runs in. In Hibernate you often have the pure class. In that case not much will happen if you only read the attributes which are not Collections etc. By ‘using attributes’ I mean this.fieldname and not using getters like this.getFieldname(). Simply because Hibernate does not support lazy loading for any other fields in that mode. However, if you touch a @OneToMany or an @ElementCollection field then you will force lazy loading on the first time toString() gets invoked. It might also behave different if you use the getters instead of reading the attributes.

And if you use EclipseLink, Apache OpenJPA or even Hibernate in byte-code weaving mode or if you get a javassist proxy from Hibernate(e.g from em.getReference()) then you are in even deeper troubles. Because in that case touching the attributes might trigger lazy loading for any other field as well.

I tried to explain how the enhancement or ‘weaving’ works in JPA in a blog post many years ago https://struberg.wordpress.com/2012/01/08/jpa-enhancement-done-right/ Parts of it might nowadays work a tad different but the most basic approach should still be the same.

Note that OpenJPA will generate a toString() method for you if the entity class doesn’t have one. In that case we will print the name of the entity and the primary key. And since we know the state of the _loaded fields we will also not force generating a new PK if the entity didn’t already load one from the sequence.
According to Gavin and Vlad Hibernate doesn’t generate any toString(). I have no clue whether EclipseLink does.

For other JPA implementations than Apache OpenJPA I suggest you provide a toString which looks like the following

public String toString() {
    return this.getClass().getSimpleName() + "-" + getId();
}

And not a single attribute more.

equals() and hashCode()

This is where Vlad, Gavin and I really disagree.
My personal opinion is that you shall not write own equals() nor hashCode() methods for entities.

Vlad did write a blog post about equals() and hashCode() in the past https://vladmihalcea.com/2016/06/06/how-to-implement-equals-and-hashcode-using-the-entity-identifier/

As you can see it’s not exactly easy to write a proper equals() and hashCode() method for JPA entities. Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().
In any case, there is a point where Gavin, Vlad and I agree upon: generating equals() and hashCode() with IDEs is totally bollocks for JPA entities. It’s always broken to compare *all* fields. You would simply not be able to update your database rows πŸ˜‰

IF you like to write a equals() method then compare the ids with a fallback on instance equality. And have the hashCode() always return zero as shown in Vlad’s blog.

Another way is to generated a UUID in the constructor or the getId() method. But this is pretty performance intense and also not very nice to handle on the DB side (large Strings as PK consume a lot more storage in the indexes on disk and in memory)

Using ‘natural IDs’ for equals()

That sounds promising. And IF you have a really good natural ID then it’s also a good thing. But most times you don’t.

So what makes a good naturalId? It must adhere to the following criteria:

  • it must be unique
  • it must not change

Sadly most natural IDs you think of are not unique. The social security number (SSN) in most countries? Hah, not unique! Really, there are duplicates in most countries…
Also often used in examples: the ISBN of a book. Toooo bad that those are not unique neither… Sometimes the same ISBN references different books, and sometimes the same book has multiple ISBNs assigned.

What about immutability? Sometimes a customer does not have a SSN yet. Or you simply don’t know it YET. Or you only know it further down the application process. So the SSN is null and only later get’s filled. Or you detect a collision with another person and you have to assign one of them a new SSN (that really happens more often than you think!). There is also the case where the same physical person got multiple SSN (happens more frequent as well).

Many tables also simply don’t have a good natural ID. Romain Manni-Bucau came up with the example of a Blog entry. What natural ID does a blog entry have? The date? -> Not unique. The title? -> can get changed later…

Why do you need equals() and hashCode() at all?

This is a good question. And my answer is: “you don’t !”

The argument why people think it’s needed for JPA entities is because e.g. having a field like:

@OneToMany 
private Set others;

A HashSet internally of course uses equals() and hashCode() but why do you need to provide a custom one? In my opinion the one you implicitly derive from Object.class is perfectly fine. It gives you instance-equality. And since per the JPA specification the EntityManager guarantees that you only get exactly one single entity instance for a row in the database you don’t need more. Doubt it? Then read the JPA specification yourself:

"An EntityManager instance is associated with a persistence context. A persistence context is a set of entity instances in which for any persistent entity identity there is a unique entity instance."

https://docs.oracle.com/javaee/7/api/javax/persistence/EntityManager.html

An exception where instance equality does not work is if you mix managed with detached entity instances. But that is something you should avoid at any cost as my following examples show.

Why you shouldn’t store managed and detached entities in the same Collection

Why would you do that? Instead of storing entities in a Set you can always use a Map. In that case you again don’t need any equals() nor hashCode() for the whole entity. And even then you might get into troubles.

One example is to have a ‘cache’.
Say you have a university management software which has a Course table. Courses get updated only a few times per year and only by some administrative people. But almost every page in the application reads the information. So what could be more reasonable as to simply store the Course in a shared @ApplicationScoped cache as Map for say an hour? Why don’t I use the cache management provided with some JPA containers? Many reasons. First and foremost they are not portable. They are also really tricky to configure (I’m talking about real production, not a sample app!). And you like to have FULL control over the cache!

So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed. All is fine as long as you only run it locally and click around on your app and only do unit tests. But under heavy load in production (our app had 5 Mio page hits/day average) you will hit the following problem:

The JPA specification does not allow an EntityManager to be used from multiple threads at the same time. As a managed entity is bound to an EntityManager, this limitation also affects the entities themselves.
So while you do the em.find() and later a coursesCache.put(courseId, course) the entity is still in ‘managed’ mode! And under heavy load it *will* happen that another user gets the still managed entity from the cache before it got detached (which happens at the tx commit or request end, depending on your setup). Boooommm it goes…

How can you avoid that? Simply use a view object. Normally the full database entities with all their gory attribute details and sub-tables are not needed on an overview course list anyway. So you better use a ‘new’ query:

CourseListVO couseViewItem 
  = em.createQuery("SELECT NEW org.myproject.Course(c.id, c.name, c.,...) " +
      " FROM Course AS c WHERE...");
cache.put(courseId, courseViewItem);

By using a ‘new Query’ you will get instances which are not managed by the container. And it’s also much faster and consumes less memory btw.

Oh I’m sure there are things which are still not cosidered yet…

PS: this is not an easy topic as you might be able to judge from looking at the involved people. Gavin is the inventor of Hibernate and JPA, Vlad is the current Hibernate maintainer. And I was involved in the DODS DB layer of Lutris Enhydra in the 90s and am a long time Apache OpenJPA committer (and even the current PMC chair).

Transaction handling in EJBs and JavaEE7 @Transactional

Handling transactions in EJBs is easy, right? Well, in theory it should be. But how does the theory translate into reality once you leave the ivory tower?

I’ll show you a small example. Let’s assume we have 2 infrastructure service EJBs:

@Stateless
public class StorageServiceImpl implements StorageService {
  private @EJB CustomerService customerService;
  private @PersistenceContext EntityManager em;

  public void chargeStorage(int forYear) throws CustomerNotFoundException {
    storeNiceLetterInDb(em);
    Customer c = customerService.getCurrentCustomer(); 
    doSomethingElseInDB(); 
  }
} 

And now for the CustomerService which is an EJB as well:

@Stateless
public class CustomerServiceImpl implements CustomerService {
  public Customer getCurrentCustomer() throws CustomerNotFoundException {
    // do something if there is a current customer
    // otherwise throw a CustomerNotFoundException
  }
}

The Sunshine Case

Let’s first look at what happens if no problems occur at runtime.

In the normal operation mode some e.g. JSF backing bean will call storageService.chargeService(2015);. The implicit transaction interceptor will use a TransactionManager (all done in the interceptor which you do not see in your code) to check whether a Transaction is already open. If not it will open a new transaction and remember this fact. The same check will happen in the implicit transaction interceptor for the CustomerService.

When leaving CustomerService#getCurrentCustomer the interceptor will recognize that it didn’t open the transaction and thus will simply return. Otoh when leaving StorageService#chargeStorage it’s interceptor will commit the transaction and close the EntityManager.

Broken?: Handling checked Exceptions

Once we leave the sunny side of the street and hit some problems the whole handling start to become messy. Let’s look what happens if there is a checked CustomerNotFoundException thrown in CustomerService#getCurrentCustomer. Most people will now find their first surprise: The database changes done in storeNiceLetterInDb() will get committed into the database.

So we got an Exception but the transaction still got committed? WT*piep*!
Too bad that this is not a bug but the behaviour is exactly as specified in “9.2.1 Application Exceptions” of the EJB specification:

An application exception does not automatically result in marking the transaction for rollback unless the ApplicationException annotation is applied to the exception class and is specified with the rollback element value true…

So this means we could annotate the CustomerNotFoundException with @javax.ejb.ApplicationException(rollback=true) to force a rollback.
And of course we need to do this for ALL checked exceptions if we like to get a rollback.

Broken?: Handling unchecked Exceptions

The good news upfront: unchecked Exceptions (RuntimeExceptions) will usually cause a rollack of your transaction (unless annotated as @AppliationException(rollback=false) of course).

Let’s assume there is some other entity lookup in the code and we get a javax.persistence.EntityNotFoundException if the address of the customer couldn’t be found. This will rollback your transaction.

But what can we do if this is kind of expected and you just like to use a default address in that case? The natural solution would be to simply catch this Exception in the calling method. In our case that would be a try/catch block in StorageServiceImpl#chargeStorage.

That’s a great idea – but it doesn’t work in many containers!

Some containers interpret the spec pretty strictly and do the Exception check on _every_ layer (EJB spec 9.3.6) . And if the interceptor in the CustomerService detects an Exception then the implicit EJB interceptor will simply roll back the whole transaction and mark it as “rollbackOnly”. Catching this Exception in an outer level doesn’t help a bit. You will not get your changes into the database. And if you try to do even more on the database then you will blow up again with something like “The connection was already marked for rollback”.

And how is that with @javax.transaction.Transactional?

Basically the same like with EJBs. In my opinion this was a missed chance to clean up this behaviour.
You can read this up in chapter 3.6 of the JTA-1.2 specification.

The main difference is how to demarcate rollback vs commit exceptions. You can use the rollbackOn and dontRollbackOn attributes of @Transactional:

@Transactional(rollbackOn={SQLException.class}, dontRollbackOn={SQLWarning.class})

Now what about DeltaSpike @Transactional?

In Apache DeltaSpike @Transactional and it’s predecessor Apache MyFaces CODI @Transactional we have a much cleaner handling:

Exceptions only get handled on the layer where the transaction got opened. If you catch an Exception along the way than we do not care about it.

Any Exception on the outermost layer will cause a rollback of your transaction. It doesn’t matter if it is a RuntimeException or a checked Exception.

If there was no Exception in the outermost interceptor then we will commit the transaction.

PS: please note that I explicitly used interfaces in my samples. Otherwise you will get NIV (No Interface View) objects which again might behave slightly different as they use a totally different proxying technique and default behaviour. But that might be enough material for yet another own blog post.
PPS: I also spared you EJBs with TransactionManagementType.BEAN. That one is also pretty much non-portable by design as you effectively cannot nest them as it forces you to either commit or rollback the tx on every layer. Some containers work fine while others really force this.

Using JPA in real projects (part 1)

Is JPA only good for samples?

This question is of course a bit provocative. But if you look at all the JPA samples out in the wild, then none of them can be applied to real world projecs without fundamental changes.

This post tries to cover a few JPA aspects as well as showing off some maven-foo from a big real world project. I am personally using Apache OpenJPA because it works well and I’m a committer on the project (which means I can immediately fix bugs if I hit one). I will try to motivate my friends from JBoss to provide a parallel guide for Hibernate and maybe we even find some Glassfish/EclipseLink geek.

One of the most fundamental differences between the different JPA providers is where they store the state information for the loaded entities. OpenJPA stores this info directly in the entities (EclipseLink as well afaik) and Hibernate stores it in the EntityManager and sometimes in the 1:n proxies used for lazy loading (if no weaving is used). All this is not defined in the spec but product specific behaviour. Please always keep this in mind when applying JPA techiques to another JPA provider.

I’ll have to split this article in 2 parts, otherwise it would be too much for a good read. Todays part will focus on the general project setup, the 2nd one will cover some coding practices usable for JPA based projects.

The Project Infrastructure and Setup

A general note on my project structure: my project is not a sample but fairly big (40k users, 5mio page hits, 600++ JSF pages) and consists of 10++ WebApps with each of them having their own backend (JPA + db + businesslogic), frontend (JSF and backing beans) and api (remote APIs) JARs. Thus I have all my shared configuration in myprj/parent/fe myprf/parent/be and myprj/parent/api maven modules, containing the pom.xml pointed to as <parent> by all backends, frontends resp apis.

β”œβ”€β”€ parent
β”‚Β Β  β”œβ”€β”€ api
β”‚Β Β  β”œβ”€β”€ be (<- here I keep all my shared backend configuration) 
β”‚Β Β  β”œβ”€β”€ fe
β”œβ”€β”€ webapp1
β”‚Β Β  β”œβ”€β”€ api
β”‚Β Β  β”œβ”€β”€ be (referencing ../../parent/be/pom.xml)
β”‚Β Β  └── fe
β”œβ”€β”€ webapp2
β”‚Β Β  β”œβ”€β”€ api
β”‚Β Β  β”œβ”€β”€ be (referencing ../../parent/be/pom.xml)
β”‚Β Β  └── fe
...

Backend Unit Test Setup

1. All my backend unit tests use testng and really do hit the database! A business process test which doesn’t touch the database is worth nothing imo…
We are using a local MySQL installation for the tests and use an Apache Maven Profile for switching to other databases like Oracle and PostgreSQL (which we both use in production).

2. We have a special testng test-group called createData which we can @Test(dependsOnGroups="createData"). Or we just use the @Test(dependsOnMethods="myTestMethodCreatingTheData").
That way we have all tests which create some pretty complex set of test-data running first. All tests which need this data as base for their own work will run afterwards.

3. Each test must be re-runnable and cleanup his own mess in @BeforeClass. We use BeforeClass because this also works if you kill your test in the debugger. Nice goodie: you also can check the produced data in the database later on. Too bad that there is no easy way to automatically proove this. The best bet is to make all your colleagues aware of it and tell them that they have to throw the next party if they introduce a broken or un-repeatable test πŸ˜‰

The Enhancement Question

I’ve outlined the details and pitfalls of JPA enhancement in a previous post.
I’m a big fan of build-time-enhancement because it a.) works nicely with OpenJPA and b.) my testng unit tests run much faster (because I only enhance those entities once). I also like the fact that I know exactly what will run on the server and my unit tests will hit side effects early on. In a big project you’ll hit enhancement and state side effects which let your app act differently in unit test and on the EE server more often than you’ll guess.
Of course, this might differ if you use another JPA provider.

For enabling build-time-enhancement with OpenJPA I have the following in my parent-be.pom.

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.openjpa</groupId>
                <artifactId>openjpa-maven-plugin</artifactId>
                <version>${openjpa.version}</version>
                <configuration>
                    <includes>
                        ${jpa-includes}
                    </includes>
                    <excludes>
                        ${jpa-excludes}
                    </excludes>
                    <addDefaultConstructor>true</addDefaultConstructor>
                    <enforcePropertyRestrictions>true</enforcePropertyRestrictions>
                    <sqlAction>${openjpa.sql.action}</sqlAction>
                    <sqlFile>${project.build.directory}/database.sql</sqlFile>
                    <connectionDriverName>com.mchange.v2.c3p0.ComboPooledDataSource</connectionDriverName>
                    <connectionProperties>
                        driverClass=${database.driver.name},
                        jdbcUrl=${database.connection.url},
                        user=${database.user},
                        password=${database.password},
                        minPoolSize=5,
                        acquireRetryAttempts=3,
                        maxPoolSize=20
                    </connectionProperties>
                </configuration>
                <executions>
                    <execution>
                        <id>mappingtool</id>
                        <phase>process-classes</phase>
                        <goals>
                            <goal>enhance</goal>
                        </goals>
                    </execution>
                </executions>
                <dependencies>
                    <dependency>
                        <groupId>log4j</groupId>
                        <artifactId>log4j</artifactId>
                        <version>1.2.12</version>
                    </dependency>
                    <dependency>
                        <!-- 
                          otherwise you get ClassNotFoundExceptions during 
                          the code coverage report run
                        -->
                        <groupId>net.sourceforge.cobertura</groupId>
                        <artifactId>cobertura</artifactId>
                        <version>1.9.2</version>
                    </dependency>
                    <dependency>
                        <groupId>c3p0</groupId>
                        <artifactId>c3p0</artifactId>
                        <version>${c3p0.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>mysql</groupId>
                        <artifactId>mysql-connector-java</artifactId>
                        <version>${mysql-connector.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>com.oracle</groupId>
                        <artifactId>ojdbc14</artifactId>
                        <version>${ojdbc.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>postgresql</groupId>
                        <artifactId>postgresql</artifactId>
                        <version>${postrgresql-jdbc.version}</version>
                    </dependency>
                </dependencies>
            </plugin>

You might have spotted a few maven properties which I later define in each projects pom. That way I can keep my common configuration generic and still have a way to tweak the behaviour for each sub-project. Again a nice benefit: You can easily use mvn -Dsomeproperty=anothervalue to tweak those settings on the commandline.

  • ${jpa-includes} for defining the comma separated list of classes which should get enhanced, e.g. "mycomp/project/modulea/backend/*.class,mycomp/project/modulea/backend/otherstuff/*.class
  • ${jpa-exludes} the opposite to jpa-includes
  • openjpa.sql.action to define what should be done during DB schema creation. This can be build for always create the whole DB schema (CREATE TABLES), or refresh for generating only ALTER TABLE statements for the changes. I’ll come back to this later.
  • ${database.driver.name} and credentials properties are used to be able to run the schema creation against Oracle, MySQL and PostgreSQL (switched via maven profiles).

Creating the Database

For doing tests with a real database we of course need to create the schema first. We do NOT let JPA do any automatic database schema changes on JPA-startup. Doing so might unrecoverably trash your production database, so it’s always turned off!

Instead we trigger the SQL schema creation process via the Apache OpenJPA openjpa-maven-plugin manually (for the configuration see below):

$> mvn openjpa:sql

Then we check the generated SQL in target/database.sql and copy it to the structure we have in each of our backend projects:

webapp1/be/src/main/sql/
β”œβ”€β”€ mysql
β”‚Β Β  β”œβ”€β”€ createdb.sql
β”‚Β Β  β”œβ”€β”€ createindex.sql
β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  └── schema_delta.sql
β”œβ”€β”€ oracle
β”‚Β Β  β”œβ”€β”€ createdb.sql
β”‚Β Β  β”œβ”€β”€ createindex.sql
β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  └── schema_delta.sql
└── postgres
    β”œβ”€β”€ createdb.sql
    β”œβ”€β”€ createindex.sql
    β”œβ”€β”€ database.sql
    └── schema_delta.sql

The following files are involved in the db setup:

createdb.sql

This file creates the database itself. It is optional as not every database supports to create a whole database. In MySQL we just do the following

DROP DATABASE if exists ProjextXDatabase
CREATE DATABASE ProjextXDatabase CHARACTER SET utf8;
USE ProjextXDatabase;

In Oracle this is not that easy. It’s a major pain to drop and then setup a whole data store. A major problem is that you cannot easily access a datastore which doesnt exist anymore via Oracles JDBC driver. Instead, we just drop all the tables.:

DROP TABLE MyTable CASCADE constraints PURGE;
DROP TABLE AnotherTable CASCADE constraints PURGE;
...

If you have a better idea, then please speak up πŸ˜‰

database.sql

This is the exact 1:1 DDL/Schema file we generated via the JPA (in my case via the openjpa-maven-plugins mvn openjpa:sql mentioned above). It is simply copied over from target/database.sql but the content remains unchanged. It runs after the createdb.sql file.

createindex.sql

This file contains the initial index tweaks which were not generated in the DDL. In Oracle and PostgreSQL this file e.g. contains all the indices on foreign keys, because OpenJPA doesn’t generate them (I remember that Hibernate does, correct?). In MySQL we don’t need those because MySQL automatically adds indices for foreign keys itself.

But this is of course a good place to add all the performance tuning stuff you ever wanted πŸ˜‰

schema_delta.sql

This one is really a goldie! Once a project goes into production we do not generate full databae schemas anymore! Instead we switch the openjpa-maven-plugin to the refresh mode. In this mode OpenJPA will compare the entities with the state of the configured database and only generate ALTER TABLE and similar statements for the changes in target/database.sql. This works surprisingly good!

We then review the generated schema changes and append the content to src/main/sql/[dbvendor]/schema_delta.sql. Of course we also add clean comments about the product revision in which the change got made. That way an administrator just picks the n last entries from this file and is easily able to bring the production database to the last revision.

Doing this step manually is very important! From time to time there are changes (renaming a column for example) which cannot be handled by the generated DDL. Such changes or small migration updates need to be maintained manually.

How to create the DB for my tests?

This one is pretty easy if you know the trick: We just make use of the sql-maven-plugin.

Here is the configuration I use in my project:

    <profiles>
        <!-- Default profile for surefire with MySQL: creates database, imports testdata and runs all unit tests -->
        <profile>
            <id>default</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <build>
                <plugins>
                    <plugin>
                        <groupId>org.codehaus.mojo</groupId>
                        <artifactId>sql-maven-plugin</artifactId>
                        <configuration>
                            <driver>com.mysql.jdbc.Driver</driver>
                            <url>jdbc:mysql://localhost/</url>
                            <username>root</username>
                            <password/>
                            <escapeProcessing>false</escapeProcessing>

                            <srcFiles>
                                <srcFile>src/main/sql/mysql/createdb.sql</srcFile>
                                <srcFile>src/main/sql/mysql/database.sql</srcFile>
                                <srcFile>src/main/sql/mysql/schema_delta.sql</srcFile>
                                <srcFile>src/main/sql/mysql/createindex.sql</srcFile>
                                <srcFile>src/test/sql/mysql/testdata.sql</srcFile>
                            </srcFiles>
                        </configuration>

                        <executions>
                            <execution>
                                <id>setup-test-database</id>
                                <phase>process-test-resources</phase>
                                <goals>
                                    <goal>execute</goal>
                                </goals>
                            </execution>
                        </executions>

                        <dependencies>
                            <dependency>
                                <groupId>mysql</groupId>
                                <artifactId>mysql-connector-java</artifactId>
                                <version>${mysql-connector.version}</version>
                                <scope>runtime</scope>
                            </dependency>
                        </dependencies>
                    </plugin>
                </plugins>
            </build>
        </profile>

        <profile>
            <!-- that skips sql plugin and test!!! -->
            <id>skipSql</id>
        </profile>
        ...
        add profile for oracle and postgresql accordingly

Whenever you run your build, the database will be freshly set up in the process-test-resources phase. The database will then be exactly as in production!

Guess we are now basically ready to start hacking on our project!

The 2nd part will focus on how to handle JPA stuff in the application code. Stay tuned!

LieGrue,
strub

Is there a way to fix the JPA EntityManager?

Using JPA is easy for small projects but has well hidden problems which are caused by some very basic design decisions. Quite a few of them are caused because the EntityManager cannot be made Serializable. Although there are some JPA providers which claim serializability (Hibernate) they aren’t!

Is the EntityManager Serializable?

The LazyInitializationException is a pretty bad beast if you ever worked with EJB managed EntityManagers. That problem caused lots of people to discover alternative ways. Two of the most prominent are JBoss Seam2 if you are working with the JBoss stack and Apache MyFaces Orchestra for Spring applications.

The basic problems are summed up very well in the at large still correct Apache MyFaces Orchestra documentation:
Apache MyFaces Orchestra Persistence explanation

If you read through the whole page, you will see the TODOs at the very bottom of the page:

TODO: is the persistence-context serializable? Are all persistent objects in the context always serializable?

The simple answer is: NO not at all! Neither the EntityManager nor the state in the entities are Serializable as per the current JPA specification!

Why is the EntityManager not Serializable

There are a few reasons:

1. Pessimistic Locking

The biggest blocker first: JPA doesn’t only support Optimistic Locking but also Pessimistic Locking. You can either declare this in your persistence.xml and also programmatically via the LockModeType in many functions.

EntityManager#find(java.lang.Class entityClass, java.lang.Object primaryKey, LockModeType lockMode) 
EntityManager#lock(java.lang.Object entity, LockModeType lockMode) 
...

But if you ever use pessimistic locking (a real hard lock on the database) the connection is bound to the database and cannot be ‘transferred’ to another EntityManager without losing the lock.

2. Id and Version fields are optional

To use the optimistic locking approach, a primary key plus some ‘version’ field must be used in the entity:

 UPDATE tableX SET([somevalues], version=:oldversion+1) WHERE id=:myId AND version==:oldversion

Obviously this update can only succeed once. Trying to update the row a second time will not find any database entry because the version==:oldversion will not be true anymore.

When you use optimistic locking in JPA, you will always have such a ‘version’ column already. But there is no need to specify it yet! Thus this information will not be transported if you serialize the entity!

To fully support optimistic locking, those entities will need mandatory @Id and @Version columns.

3. Losing the entity state information

As outlined in a previous blog post every JPA entity will get ‘enhanced’ with some magic code which tracks _loaded and _dirty state information. Those BitFlags will track the parts of the entity which got changed or fetched lazily.

The problem in this area is mostly caused by the JPA spec which by default prevents the JPA providers from serializing the ‘enhanced entities’ but requires serializing the ‘native’ information. At least that seems to be the common understanding of the following paragraph in the JPA spec:

β€žSerializing entities and merging those entities back into a persistence context may not be interoperable across vendors when lazy properties or fields and/or relationships are used.
A vendor is required to support the serialization and subsequent deserialization and merging of detached entity instances (which may contain lazy properties or fields and/or relationships that have not been fetched) back into a separate JVM instance of that vendor’s runtime, where both runtime instances have access to the entity classes and any required vendor persistence implementation classes.

Of course, most JPA providers know a way to enable the serialization of the state fields. In OpenJPA just provide the following magic properties to your persistence.xml:

<property name="openjpa.DetachState" value="loaded(DetachedStateField=true)"/>
<property name="openjpa.Compatibility" value="IgnoreDetachedStateFieldForProxySerialization=true"/>

This will also serialize _loaded and _state BitFlags along with your Entity.

The problem with having the EntityManager not Serializable

Well, this one is easy:

  • You cannot store the EnityManager in a Conversation
  • You cannot store the EntityManager in a Session
  • You cannot store the EntityManager in a JSF View
  • No clustering, because Clustering means that you need to Serialize the state

What can you do today?

Today the only working sulution is the entitymanager-per-request pattern. Basically creating a @RequestScoped EntityManager e.g. via a CDI @Produces for each and every request. That also means that you need to manually merge those entities on the callback. If you use JSF that is in your action.

 

How to fix the JPA EntityManager in the future?

Here are my thoughts about how we can do better in the future. Please note that there is a project called Avaje eBean which is not JPA compliant but has already successfully implemented those ideas.

Provide an OptimisticEntityManager

public interface OptimisticEntityManager extends EntityManager, Serializable

The most important change here is that it implements the java.io.Serializable interface.
This OptimisticEntityManager should throw an NonOptimisticModeException whenever one tries to execute an operation on the EntityManager which requires a non-optimistic LockModeType or another operation which creates some lock or non-serializable behaviour.

There should be a way to explicitly request an OptimisticEntityManager, e.g. via

OptimisticEntityManager EntityManagerFactoy#createOptimisticEntityManager(); 

Make @Id and @Version mandatory for those Entities

This will solve the problem with losing the optimistic lock information when serializing.

Define _loaded and _dirty Serialization

The future JPA spec could either clarify that serialization is more important than JPA-vendor inter-compatibility (who uses 2 different JPA providers in the same environment anyway?).
Or just specify that 2 BitFlags can be passed in the Serialized entity and how they should behave.

Please tell me what you think? Do we miss something? It’s not an easy move, but up to now I think it is doable!

PS: Thanks to Shane Bryzak and Jason Porter for helping me get rid of the worst English grammar and wording issues at least. Hope you folks got the gist regardless of my bad english πŸ˜‰