The secret of Groovy script refresh

The first thing one should undestand before he tries to integrate scripting support into his application / framework are class loading issues. One of the main reasons (next to the ability to easily switch from Java) why we have chosen Groovy as our primary scripting language is very good support for live refresh of Groovy classes when source file has changed. But what does Groovy exactly do when it "refreshes" its loaded classes to conform to a newly modified source file? What about existing instances referencing to this class? Is it even possible in JVM to change class structure in runtime? Yes JavaRebel can do this, but it needs special setup and debug mode for hotswap. And how does all this fit into the existing Spring support? From the documentation it seems, that it all just magically works! Dozens of questions ran in my mind when I started to strive for Groovy integration in our product. Those questions gets answered in this article.

There are several options for integrating groovy into your application according to the Groovy documentation:

  • evaluate scripts or expressions using the shell
  • generate and use new Java classes with GroovyClassloader
  • generate and use new Java classes with GroovyScriptEngine

All three options can refresh code / class behaviour accorgingly to changes of underlying code. The difference between GroovyClassloader and GroovyScriptEngine regarding refreshing policy is that GroovyScriptEngine is able to recompile class even when that class hasn't directly changed, but any of classes this particular class depends on got changed (consider for example change of constant in interface your class uses).

Note: there are several code samples, that are more transparent seeing them in the IDE. I recommend download them and read them from your prefered IDE.

Basic recompilation principles

It's not surprising, that there is no magic involved. The background of Groovy automatic recompilation is as follows:

  • the only way how class can be changed is load it again
  • its true that oldClass != recompiledClass &&
    oldClass.getName().equals(recompiledClass.getName())
  • once Java Class is created - objects instantiated on its base do not ever change
  • old classes (even if they become obsolete by recompiled versions of these) won't get garbage collected if there is single instance of them held somewhere by the strong reference
  • GroovyClassLoader / GroovyScriptingEngine won't get garbage collected if there is single class it has created held somewhere by the strong reference

These simple statements, can be proven by following set of tests:

GroovyScriptingEngine


public void testReloadGroovyClass() throws Exception {
	//loads class, creates instance of it and calls nonparametrized method helloWorld, all references are
	//then stored in custom class GroovyContext
	GroovyContext ctx = createInstanceAndCallMethod(engine, "com.fg.mock.MainGroovyClass", "helloWorld");
	//initial result of helloWorld method is following
	assertEquals("Hello Universe.", ctx.getGroovyResult());
	//we'll change source file contents so as method returns different value
	modifyGroovyFile(
			"com/fg/mock/MainGroovyClass.groovy",
			"return HELLO_UNIVERSE;",
			"return \"Bye bye\";"
	);
	//existing instances will remain unchanged
	assertEquals("Hello Universe.", ctx.getGroovyMethod().invoke(ctx.getGroovyInstance()));
	//new instances will use new definition
	GroovyContext newCtx = createInstanceAndCallMethod(engine, "com.fg.mock.MainGroovyClass", "helloWorld");
	//and as we see, result gets changed
	assertEquals("Bye bye", newCtx.getGroovyResult());
}
public void testReloadGroovyDependentClass() throws Exception {
	//loads class, creates instance of it and calls nonparametrized method helloWorld, all references are
	//then stored in custom class GroovyContext
	GroovyContext ctx = createInstanceAndCallMethod(engine, "com.fg.mock.MainGroovyClass", "helloWorld");
	//initial result of helloWorld method is following
	assertEquals("Hello Universe.", ctx.getGroovyResult());
	//we'll change source file of the Groovy interface our class implements and use to deduce return value
	//of the helloWorld method call
	modifyGroovyFile(
			"com/fg/mock/GroovyHelloWorldInterface.groovy",
			"HELLO_UNIVERSE = \"Hello Universe.\";",
			"HELLO_UNIVERSE = \"Bye Universe.\";"
	);
	//even if we didn't change the main class it become reloaded as dependent one was changed
	GroovyContext newCtx = createInstanceAndCallMethod(engine, "com.fg.mock.MainGroovyClass", "helloWorld");
	assertEquals("Bye Universe.", newCtx.getGroovyResult());
}

GroovyClassloader

The single difference in this example is that a class is not reloaded when another class (interface) this class depends on changes (or better - its source code changes).


public void testReloadLoadGroovyClass() throws Exception {
	//loads class, creates instance of it and calls nonparametrized method helloWorld, all references are
	//then stored in custom class GroovyContext
	GroovyContext ctx = createInstanceAndCallMethod(clsLoader, "com.fg.mock.MainGroovyClass", "helloWorld");
	//initial result of helloWorld method is following
	assertEquals("Hello Universe.", ctx.getGroovyResult());
	//we'll change source file contents so as method returns different value
	modifyGroovyFile(
			"com/fg/mock/MainGroovyClass.groovy",
			"return HELLO_UNIVERSE;",
			"return \"Bye bye\";"
	);
	//existing instances will remain unchanged
	assertEquals("Hello Universe.", ctx.getGroovyMethod().invoke(ctx.getGroovyInstance()));
	//new instances will use new definition
	GroovyContext newCtx = createInstanceAndCallMethod(clsLoader, "com.fg.mock.MainGroovyClass", "helloWorld");
	assertEquals("Bye bye", newCtx.getGroovyResult());
}
public void testReloadLoadGroovyDependentClass() throws Exception {
	//loads class, creates instance of it and calls nonparametrized method helloWorld, all references are
	//then stored in custom class GroovyContext
	GroovyContext ctx = createInstanceAndCallMethod(clsLoader, "com.fg.mock.MainGroovyClass", "helloWorld");
	//initial result of helloWorld method is following
	assertEquals("Hello Universe.", ctx.getGroovyResult());
	//we'll change source file of the Groovy interface our class implements and use to deduce return value
	//of the helloWorld method call
	modifyGroovyFile(
			"com/fg/mock/GroovyHelloWorldInterface.groovy",
			"HELLO_UNIVERSE = \"Hello Universe.\";",
			"HELLO_UNIVERSE = \"Bye Universe.\";"
	);
	//Groovy class loader doesn't recompile classes if dependent ones change
	GroovyContext newCtx = createInstanceAndCallMethod(clsLoader, "com.fg.mock.MainGroovyClass", "helloWorld");
	assertEquals("Hello Universe.", newCtx.getGroovyResult());
}

Warranted path to PermGenSpace leaks

We can deduce from the class loading behaviour documented above, that path to the PermGenSpace leaks is straight and easy. When you create instances of groovy classes and store them in a long living memory scopes (such as static variables, singletons, servlet context or session scopes in case of web applications) and change source code of the groovy classes frequently - letting Groovy recompile them on the fly, it would likely lead to the OutOfMemoryError in PermGenSpace. Moreover there would be a bunch of instances based on currently obsolete code source from different modification stages. If you don't believe me, look at the following test:


/**
 * Run with: -XX:MaxPermSize=1m -Xmx4M
 * @throws Exception
 */
public void testOutOfMemory() throws Exception {
	int iterations = 1000;
	//uncomment this to have a long and happy life
	//WeakHashMap classes = new WeakHashMap(iterations);
	//WeakHashMap instances = new WeakHashMap(iterations);
	//this leads to end with OOME
	HashMap classes = new HashMap(iterations);
	HashMap instances = new HashMap(iterations);
	//this is test map for querying garbage collection of GroovyScriptEngines
	WeakHashMap engines = new WeakHashMap(iterations);
	//thousand iterations is enough to fail with 1MB of PermGenSize
	int i = 0;
	try {
		for (; i < iterations; i++) {
			//keep modifying source code file
			modifyGroovyFile(
					"com/fg/mock/MainGroovyClass.groovy",
					"return .+;",
					"return \"Hello world " + i + "\";"
			);
			//each time init new engine (even with the same engine it would early fail)
			engine = new GroovyScriptEngine(
					new URL[] {rootFSR.getURL()}
			);
			engines.put(engine, Boolean.TRUE);
			//new instances will use new definition
			GroovyContext ctx = createInstanceAndCallMethod(engine, "com.fg.mock.MainGroovyClass", "helloWorld");
			//store instances in long living scope
			classes.put(ctx.getGroovyClass(), Boolean.TRUE);
			instances.put(ctx.getGroovyInstance(), Boolean.TRUE);
			//each hundred iterations, run manualy GC (this is not necessary, but makes following line more precise)
			if (i % 100 == 0) {
				System.gc();
				System.out.println(
						(i + 1) + ": retained classes - " + classes.size() +
								", retained instances - " + instances.size() +
								", retained engines - " + engines.size()
				);
			}
		}
	} catch(OutOfMemoryError e) {
		//here we are - with the OOME
		String msg = "Out of memory with classes - " + classes.size() +
				", retained instances - " + instances.size() +
				", retained engines - " + engines.size() +
				" on " + (i + 1) + " iterations.";
		System.out.println(msg);
		fail(msg);
	}
}

A safe way around memory leaks

It seems that the only way how to safely avoid memory leaking with Groovy recompilation is not to work directly with references to Groovy instances. You should use further explained approach everytime you need to store reference to Groovy object into a longer living memory scope - such as:

  • static variables of Java classes *)
  • global variables of singleton Java instances *)
  • ServletContext or Session
  • local variables of long running methods (in daemon Threads and so on)
  • maybe other places that haven't came into my mind

*) Java instance or Java class means in this sense a class loaded by Java classloader (one of the parent classloaders of Groovy class loader)

In such cases you should wrap the reference to Groovy instance into an another object managed by the Java class loader. Simple WeakReference would suffice in this case, but it has at least two big disadvantages:

  1. you cannot work with the Groovy instance transparently. Each time you want to call a method on a Groovy instance, you must retrieve encapsulating WeakReference and unwrap Groovy instance in it by calling get() method
  2. you must take care of keeping instance alive another way, as WeakReference doesn't guarantee that Groovy instance object wouldn't get garbage collected (so that there must be another hard reference in static variables of Groovy classloader)

On the other way you are safe to work with direct references to groovy instances in:

  • static variables of Groovy classes
  • global variables of "singleton" Groovy instances

You are completely safe to work with references as long as they don't leave the scope of the Groovy class loader (as you are in both above mentioned cases). Providing none of Groovy instance reference escaped from the Groovy scope GC can free all Groovy instances and classes as soon as you drop reference to the GroovyClassloader that created them.

Spring Framework's solution confirms the idea

Base clues for my research are based in Spring Groovy scripting implementation. Spring wraps all Groovy based beans into the JDK proxy, that could check original source code changes in specified intervals:


<lang:groovy id="mainGroovyClass"
                 refresh-check-delay="0"
                 script-source="file:${project.build.directory}/target/test-classes/com/fg/mock/MainGroovyClass.groovy"/>

Spring implementation, though very clever, has some limitations and I wonder what motivations lay behind chosen solution. Maybe some role plays the fact, that the same scripting layer is also used for other scripting libraries such as JRuby or Beanshell.

Mentioned limitations are:

  • JDK Proxy will allow you to call only methods placed in interfaces located in Java classes (means interfaces loaded by Java class loader)
    - this is not much cumbersome as long as you work with those instances in plain Java classes, because then you would need to use Reflection to get to the methods not declared on the Java interfaces. But when you forward your instances in some templating solution - such as Velocity of Freemarker, where reflection is used on every call - it's more than desirable to be able to call groovy methods directly without necessity to declare those methods in some Java interfaces.
  • Spring reflects changes only on beans declared in Spring's application context - when you directly instantiate Groovy class in the class representing Groovy bean and source code of this "referenced" Groovy class changes, your bean won't notice it - Spring watches only source code of the beans.

Again I can prove those limitation by following tests:


public void testReloadLoadGroovyClass() throws Exception {
	JavaHelloWorldInterface myGroovyInstance = (JavaHelloWorldInterface)ctx.getBean("mainGroovyClass");
	assertEquals("Hello Universe.", myGroovyInstance.sayHello());
	modifyGroovyFile(
			"com/fg/mock/MainGroovyClass.groovy",
			"return HELLO_UNIVERSE;",
			"return \"Bye bye\";"
	);
	//Spring will take care of refresh
	assertEquals("Bye bye", myGroovyInstance.sayHello());
}
public void testReloadLoadGroovyDependentClass() throws Exception {
	JavaHelloWorldInterface myGroovyInstance = (JavaHelloWorldInterface)ctx.getBean("mainGroovyClass");
	assertEquals("Hello Universe.", myGroovyInstance.sayHello());
	modifyGroovyFile(
			"com/fg/mock/GroovyHelloWorldInterface.groovy",
			"HELLO_UNIVERSE = \"Hello Universe.\";",
			"HELLO_UNIVERSE = \"Bye Universe.\";"
	);
	//Spring DOESN'T take dependent classes into an account
	assertEquals("Hello Universe.", myGroovyInstance.sayHello());
}
public void testReloadGroovyDependentClassVerifyInstance() throws Exception {
	JavaHelloWorldInterface myGroovyInstance = (JavaHelloWorldInterface)ctx.getBean("referencingGroovyClass");
	assertEquals("Hello Universe.", myGroovyInstance.sayHello());
	modifyGroovyFile(
			"com/fg/mock/MainGroovyClass.groovy",
			"return HELLO_UNIVERSE;",
			"return \"Hei universumissa.\";"
	);
	//Spring can refresh underlying bean even for existing instances (proxy plays its role)
	assertEquals("Hei universumissa.", myGroovyInstance.sayHello());
}

Proof of concept

These findings and supposed solutions were deduced by studying code of Groovy and Spring and are proven by a set of unit prototyping unit tests. As a novice in Groovy I cannot present my deductions as rock solid so you're welcome to download the tests and either confirm or refutate my conclusions.

Download ZIP with sources and tests

What comes next ...

In the next part of this "Groovy integration serie" I will present solution for programmatic Groovy object instantiation, that will allow you:

  • work with references to Groovy instances safely thorough your codebase - even storing them in long living scopes
  • keep making profit on Groovy's ability to refresh classes in runtime
  • access any public method / variable declared on Groovy class from outside (fe. any templating solution)

So, stay tuned!