Idea about adding Ownership to Java Memory Model

We are fortunate to live in an era of great innovation. 3 years ago I was laughed at by suggesting adding some reference tracking into Java GC, now we have G1 and Shenandoah (Azul C4 was already a big thing), that are aware of some references.

Today I want to suggest one more idea. It is not original, it comes from Rust. But I am going to write down some initial design notes about the Runtime/Compiler support for this concept.

The idea of Ownership in Rust is brilliant. Developer tells compiler that this local variable never leaves this scope. Rust has complex scope management (named scopes!), but I am mostly interested in just one class of them - a scope of the function.

Let's consider an example. A developer is writing a JEE application and she wrote a filter that buffers input stream of a HttpServletRequest. She knows that a single worker thread owns that buffer. Once all of the request processing is done, the life of the buffer is over. Currently, neither compiler nor runtime is aware of that. The buffer is alive until the next minor GC. If the request processing was slow, buffer even have a chance to mature to the old gen.

Can we design a system that would let developer express the scoped objects nature to compiler and runtime to use it? We can. If we do, we will be able to manage those objects more efficiently.

Let's think of a Java friendly syntax first, and then I will talk about JMM and Runtime implications. I will leave compiler changes only roughly outlined.

public int calculateSomething() {
    @Scoped
    Duration durationTracking = new Duration(new Date());
    int result = doTheCalculation(durationTracking);

    reportToLog(durationTracking);
}

Annotations proved to be developer friendly. If this code is compiled by a compiler that does not support ownership concept, it can be ignored.

What are the limitations we have to add to the scoped reference?

  • Scoped references can not be placed into collections
  • Scoped references can not be assigned to fields
  • Scoped references can not be passed to other threads
  • Scoped references can not be placed into ThreadLocal

In a normal Java application there are plenty of objects that are allocated in a worker thread doing its job, but that are never accessed by other threads. Java runtime already taking advantage of that in its lightweight CAS-based synchronization. Developer still can pass those references as method arguments. Developer still can assign a reference to other local variables.

Compiler would need to enforce those aspects of JMM as much as possible. It would also need to place a call to clear scoped objects at all exit points of a method that have instantiated any. Much like it is already doing for try-finally.

Runtime is interesting as well.

We will need a separate off-heap space in which we allocate short lived scoped objects. Just think about cache locality gains. Each thread has its own slab. I do not think we should ever change its size in the runtime. User will be able to tweak the size of the per-thread allocation, just like she now can define per thread stack size.

But we also need to guard against crafty developer passing the reference to a scoped object out of the scope. To identify the scope we only really need to know a thread id. And we do not allow any sort of multi-thread access to scoped objects. That means that we can re-use the light lock structure in the OOP for scopes. In effect, scoped instances will be allocated with "locked" bit set and current thread's id in the same "owner" part of the OOP. We may need one more bit in that OOP to indicate that the object is scoped, but OOP structure changes are expensive and I will try to avoid them. Scoped objects are allocated in a separate space - that should be enough information in runtime to properly detect it in every sensitive place.

By not allowing to place scoped objects into collections, we guarantee to not inflict extra complications on collections and streams frameworks.

There is a big question about @Scoped propagation. If developer instantiates Duration and it instantiates a Date and a Long in its constructor, which are not annotated with @Scoped - where would those referenced instances allocated? For the safety, we may go with no propagation. But that will damage cache locality. I think the compiler would need to be smart and propagate @Scoped when it can prove that internally instantiated object conforms to the @Scoped conditions.

The gains are big. Many applications are written in a way that they have a set of more or less static data allocated at the time application starts and then they start worker threads. Worker threads allocate short lived objects. We are doing great job by handling them in the Young Generation of the heap, but it is still not uncommon to see them promoted to Old Gen. Using scoped objects we will improve the performance implications of GC, without inflicting any pain on the developer.

Rust borrowed ideas from Java. Can Java borrow some back?

Reddit discussion