jeudi 2 janvier 2014

Gatling-vs-JMeter-FACT-CHECKING

Performances of Gatling-vs-JMeter FACT CHECKING:

"all competitive benchmarking is institutionalized cheating." Guerrilla Manifesto

Fact

"Gatling has much better performances than JMeter, See for yourself!" and the following 2 graphs are shown":

Gatling 1.3.2:

JMeter 2.8:


Context

I have been following with interest the hype around Gatling that has been running since a year and few months.
The tool looked interesting (asynchronous system, reactor pattern...), I used it a bit at home and on a non critical project to see it at work, I did not continue using it in a professional context due to highly unstable API, limited features compared to JMeter, incomplete HTTP protocol at the time I started with Gatling 1.
HTTP protocol implementation seems now close to complete, but other issues mentioned remain.
Note I wrote in a previous blog about my experience and thoughts on Gatling.
This page presents a Benchmark made by Gatling Team which is supposed to be the equivalent of a benchmark ran by Apache JMeter team and mentioned on JMeter Wiki:

I don't know what was the exact intention of this page, was it to kind of discredit JMeter by saying it didn't inject in a stable way the load correctly on Tomcat, the conclusion being "how can you trust results guys ?", seems so:

Slide content:





The Mystery
One thing that always disturbed me was the very neat look of their graph "Number of transactions per second" :


I don't know why but it always looked suspicious to me from my experience and technical skills in load testing.

Another thing that annoyed me and looked very non scientific was the following quote on benchmark page which said benchmark was ran in "quite similar conditions" which is:
  • same local Tomcat 6.0.24 with same heap options => OK
  • Mac OS X 10.8.2 => KO, JMeter benchmark mentions 10.6.8
  • Hotspot 1.6.0_35 => KO, JMeter benchmark mentions 1.6.0_29
  • 2.3 GHz Intel Core i7 proc => KOJMeter benchmark mentions 3.06 GHz Intel Core 2 Duo
  • default Gatling JVM options (512Mo heap) => OK
Note also that Benchmark strangely uses a pretty old version of Gatling, the version 1.3.2, while page was updated when newer versions had been released.

In all these "similar conditions", the big key difference that hurt me was the Processor big difference, Intel Core i7 (used in Gatling benchmark) being 50% faster than Intel Core 2 Duo (used in JMeter benchmark):
I tweeted recently with Stéphane Landelle (Developer of Gatling) about it and his answer was that he had made the test on old machine but forgot to update website, I must say this gave me the idea to take some time and work on the benchmark myself:

At time of Gatling 1.3.2, I ran the same benchmark on my machine which is similar to the JMeter benchmark, and what disturbed me was the different behavior of Tomcat instance depending on wether it was Gatling or JMeter running the test (much lower number of connector threads were started when Gatling was the Load Test Engine).

I made again the test with 1.4.2 and still got this difference and always that perfect graph.

So I said to myself, man you're stupid, there's a mystery you don't understand...

Mystery Uncovered:

This became suddenly clear to me with the release of Gatling 2 and backport to Gatling 1.5.0:
  • In Gatling 1 (before 1.5.0), connections are shared amongst users. This behavior does not match real browsers, and doesn't support SSL session tracking.
Then I had to take some time to run the test and write that blog.


The TEST:

Benchmark conditions are the same as the JMeter page http://wiki.apache.org/jmeter/JMeterPerformance , except for the following below in BOLD:
  • Tomcat version 6.0.24
  • Tomcat JVM : -Xms256m -Xmx1024m
  • JMeter JVM : -Xmx512m (Default options) plus the same algorithm as Gatling for GC:
    • -XX:+UseParNewGC 
    • -XX:+UseConcMarkSweepGC 
    • -XX:+CMSParallelRemarkEnabled 
  • Gatling JVM :  -Xmx512m (Default options)
  • Set session timeout in web.xml to 1 minute

In web.xml, add this:

<session-config>

    <session-timeout>1</session-timeout>

</session-config>
Software versions:
Scripts:
  • JMeter:  the same one mentionned here http://wiki.apache.org/jmeter/JMeterPerformance
  • Gatling: I added disableCaching call to be in the same conditions as JMeter benchmark which does not use Caching feature (no HTTP Cache Manager in Test Plan), script on Gatling website worked in Gatling 1.3.2 because there was also a bug in caching, currently with 1.5.3 it fails on status 200 check, as one GET gets cached (so 304 response).

Gatling script  (in red the only modification I made):


The TRUTH

Results:

Gatling 1.5.3:

As you can see graph is a bit different from Gatling Home one !, less perfect !

Now lets zoom to be in the same conditions of JMeter (no ramp down):

Wow, the graph looks much less "PERFECT", unzoom is tricky yes !



JMeter 2.11:
This is the result of using the great jmeter-plugins.
To be in the same display conditions as Gatling which has one point per second, I set the option "Limit number of points in row" to 607 (Test lasts 10 minutes and 7 seconds):



And here is what we get:



Conclusion:

Now let's scale the JMeter Graph (bottom graph) to have the same display ratio as Gatling one (top one), and here is what we get:




WOW, No graph is perfect and they are pretty the same !!!!

Lessons learned:

  • Beware of Benchmarks conditions and what you think are "quite similar conditions"
  • Beware of Magical Tools and perfect graphs in Load Testing world:
    • Connection sharing gives better performances and cleaner graphs , YEAH BUT it is not realistic, Hey man, we don't want to load test your load test engine we want to load test the customer application ! 
  • Beware of Reports:
    • Unzoom, when something is added (ramp down) it distorts the perception
    • Graph proportions, it distorts the perception
  • Beware of hipsters, "cool nerds", don't be a sheep and never forget Saint Thomas:
    • I'll believe that when I see it!
My moto is even more:
  • I'll believe that when I test it!

Now don't believe me, and go do the test for yourself girls and guys !


UPDATE 03 january 2014:


  • The High Performance link has disappeared from Home Page but Internet is great you can still find an old version here:
I made a screenshot of page as of 2 january 2014:


  • The benchmark page content has changed, I am happy I have been so convincing ! :-) , you can see Gihub page history:

I had made some screenshots of its content:





Regarding what is said in new Benchmark page, the flawed bench is due to 4 factors:
  • Unrealistic behavior due to connection sharing at time of 1.3.2
  • Bug in caching feature as Benchmark page now mentions it
  • Wrong comparison of reports due to ramp down additional part which unzooms the graph
  • Different graph proportions which favored Gatling, graph height being small 

What I find funny is a page made to discredit JMeter and used accross so many presentations including Devoxx, JUGs and Duchess events finally kinds of discredits Gatling, 
"Hung by his own rope"