Archive for the ‘Java’ Category

When you checked out multiple git-repositories and you want to update them all at once, you can execute the following line:

$ ls -d */ | xargs -I{} git -C {} pull

Reduce noise in an onejar FatJar

Posted: May 15, 2015 in Java
Tags: ,

When you’re using onejar 1.4.5 or above you want to turn off the noise the bootloader makes.

java -Done-jar.verbose=false -jar service.jar

Vertx3 is nearing its release date and we’re experimenting with the upcoming release to minimize future surprises. One of the major changes is the generation of fatjars. The Vertx2 fatJar plugin doesn’t exist anymore in the Vertx 3 world. Now Vertx 3 uses the maven-shade-plugin to generate the fat JAR. Unfortunately, the shader has its own problems. For example, it is very cumbersome to handle duplicate files especially in transitive dependencies which are something out of your control.

When you’re running into unsolvable errors with the maven-shade-plugin, you can easily switch to the onejar-maven-plugin. Configure your POM as follows:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-jar-plugin</artifactId>
    <version>2.6</version>
    <configuration>
        <archive>
            <manifestEntries>
                <Main-Class>io.vertx.core.Starter</Main-Class>
                <Main-Verticle>io.vertx.example.HelloWorldVerticle</Main-Verticle>
            </manifestEntries>
        </archive>
    </configuration>
</plugin>
<plugin>
    <groupId>org.dstovall</groupId>
    <artifactId>onejar-maven-plugin</artifactId>
    <version>1.4.5</version>
    <executions>
        <execution>
            <configuration>
                <attachToBuild>true</attachToBuild>
                <filename>service.jar</filename>
            </configuration>
            <goals>
                <goal>one-jar</goal>
            </goals>
        </execution>
    </executions>
</plugin>

This will create a new fat JAR with the project JARs as a complete file on the inside. They are not unpacked, so there are no issues with overwriting files in the META-INF. The downside is that they use an own boot-mechanism.

This again has some problems when for example certain JARs use the SPI mechanism or self-cooked mechanisms to load classes.

Easy streaming in Vertx.io

Posted: March 9, 2015 in Java
Tags: ,

If you want to stream an incoming request, you can use the following code:

    final Handler<HttpServerRequest> streamer = (HttpServerRequest request) -> {
        HttpServerResponse response = request.response();

        container.logger().info("Streaming... ");
        long ts = System.currentTimeMillis();

        // handle the content
        request.dataHandler((Buffer data) -> {
            container.logger().info("Received " + data.length());

        });

        request.endHandler(v -> {
            container.logger().info("Done! " + (System.currentTimeMillis() - ts));
            request.response().end();
        });

        // handle upload errors
        request.exceptionHandler(
                (Throwable throwable) -> {
                    throwable.printStackTrace();
                    response.setStatusCode(HttpResponseStatus.BAD_REQUEST.code()).end();
                }
        );
    };

The Vertx 2 “multithreaded” Option

Posted: January 15, 2015 in Java
Tags: ,

The current threading model in Vertx 2 is very powerful. So powerful that you can easily put yourself out of business. In a project I’m currently working on, we created one (micro)service which does no more than store data in a MongoDB instance and creates a in-memory Lucene to provide a search functionallity. The functionality was trivial. There is a REST interface to do basic CRUD operation, a search resource and one special feature: an importer.

The search-resource contains nothing more than code to publish a message to a Lucene worker-verticle. This worker-verticle uses a LocalHandler to listen for events. And all worked well, since we had more then +2000 wildcard-search-operation per seconds. Internally we have 20 worker threads.

As we went on with the development, we encountered a problem during the import-process last week. A legacy system uploads a file to the import resource of the service. The importer receives the file, validates the content, and starts the import. The data was first stored in the MongoDB and when this call was finished, the same data was added to the Lucene index across the cluster.

One problem however was that all the worker-threads were blocked during the import-process leaving no room for parallel requests. After a brief search we saw that MongoDB-verticle was consuming all the worker-threads leaving no threads available for the Lucene-Index-worker. Like this, the node is not reachable and we had the negative effect that the supervision-service was marking this node as being down. We had to reduce the number of threads of the MongoDB-verticle. But, looking in the GitHub code of this verticle, we saw that the module had the property “multithreaded”: true in the mod.json. This is sometimes good, but in our case bad. We’ve forked this project and made the verticle no longer multithreaded. To have enough capacity during the import, we’ve instantiated the MongoDB-verticle five times. We currently run with 20 worker threads, so we always have room to keep the search function alive.

Lessons learned is that even in predefined modules, you need to fine-tune the threading-strategy.

It is not unusual to use MySQL’s YEARWEEK() function to create identifiers for weeks within years. The problem is well-known. You need to store the week number for a certain year. Storing only the week-number decouples your data from the real year. You need to store the year too and that’s when YEARWEEK kicks in. But beware, there are some pitfalls!

MySQL

The YEARWEEK()-function gives us something like 201304 for the fourth week of 2013. But that’s only half the story. Problems arise when you want to know the week-number for 31.12.2012. This could be 201253 or 201301, depending on how you see it. The week could start on monday, or sunday and the first week of the year starts after 4 days or not.

There is an agreement on the week-calculation. It defined as ISO 8601. The first weeks must have at least 4 days and starts on a monday. See http://en.wikipedia.org/wiki/ISO_8601 for more information.

Unfortunately, MySQL uses mode “0” for the week-calculation. It is not the ISO 8601 norm. You must set MySQL to use mode “3” (default_week_format) or pass it as a parameter in the YEARWEEK() function.

Mode First day of week Range Week 1 is the first week …
0 Sunday 0-53 with a Sunday in this year
1 Monday 0-53 with more than 3 days this year
2 Sunday 1-53 with a Sunday in this year
3 Monday 1-53 with more than 3 days this year
4 Sunday 0-53 with more than 3 days this year
5 Monday 0-53 with a Monday in this year
6 Sunday 1-53 with more than 3 days this year
7 Monday 1-53 with a Monday in this year

So, the following problems are solved:

select YEARWEEK("2012-12-31");

gives us 201253.<br />

select YEARWEEK("2012-12-31", 3);

gives us 201301 which adheres to the ISO 8601.

Ok, this problem seems solved. The database seems to have the dates correct. But if I want to query the week, I cannot always rely on the database to calculate me the correct date. Sure, I could do round-trips to the server sending a date and receiving the correct week number. But that’s slow.

Java

Let’s try to rebuild the YEARWEEK() in Java using the ISO 8601 norm. The Calendar-class is not the solution we’re looking for. Sure, you can get the week, but you can’t get the correct year for that week when you’re in the  ISO 8601 mode. For example, for 2012-12-31 you get the week 01 but the year 2012, resulting in 201201 for the last week of the year. Which is of course, incorrect!

The package Joda helps us out and provides the solution. Out of legacy-grounds, the API works with a calendar object. Joda provides us the correct week in the year and the correct year for the that week (even though the real year is different).

I wrote a test-class with the Java-method to generate the values.

import org.joda.time.DateTime;

// ....

/**
* Return the ISO 8601 format of YEARWEEK with a given calendar.
*/
public int from(final Calendar calendar) {
    DateTime dt = new DateTime(calendar) ;
    return dt.weekyear().get() * 100 + dt.weekOfWeekyear().get();
}

I’ve tested the results agains the MySQL YEARWEEK for 12 years and all seems to work fine!

I am experimenting with Glassfish 4 to prepare in order to move some application from J2EE6 to J2EE7. Glassfish 4 works with J2EE 7 and introduces some new concepts. The one which we are investigating today is the use of @Asynchronous and @Suspended in REST resources.

The use of the asynchronous annotations is pretty well specified for beans but there are some pitfalls when it comes to REST services. Let’s go through an example and check the behavior of a REST service with asynchronous methods as we go.

The use case is the following. We define a Resource and one Bean. The resource is a typical REST-resource with only one GET-method. The bean is a normal, stateless session bean which performs a long running task. This bean is pretty straightforward. We do not annotate the methods of the bean as asynchronous.

@Stateless
public class LongSession {
   public String doLongTask() {
       try {
           // do a long task
           Thread.sleep(10*1000);
       } catch (InterruptedException ex) {
           // ignore
       }
       
       return "done";
   }
}

The first resource we build is pretty simple. We create an @Stateless resource and inject a @Suspended AsyncResponse. AsyncResponse takes care of the asynchronous response when the results become available.

@Path("as")
@Stateless
public class AsResource {

    @EJB
    private LongSession longSession;
    
    @GET
    @Produces(MediaType.TEXT_HTML)
    @Asynchronous
    public void getXml(@Suspended AsyncResponse resp) {
        System.out.println("do a long task in thread: " + Thread.currentThread().getName());
        String result = this.longSession.doLongTask();
        Response r = Response.ok("this is the " + result).build();
        resp.resume(r);
    }
 
}

When we open the browser and point to the URL of this resource we will see that the request is blocked for 10 seconds! But that is nothing new. Now, the problem is that we want to know how many threads are available to serve these requests. We are using a stateless bean, so we use the thread-pool of the ejb-container. This has some implications. When we use the http-thread-pool we will basically the same behavior, but we are not interested in the request thread. I want to scale out the ejb-pool. Adding beans to the pool is apparently not enough.

When you press F5 continuously in the browser, you will see something like “INFO: do a long task in thread: __ejb-thread-pool1” in the log. This number counts up and exceeds the threads in the http-thread-pool thanks to the AsyncResponse and @Suspended. But you will see (in a freshly installed domain) that the thread-pool does not exceed the limit of 16 even though you have max. 64 beans in your pool. We need to finetune the thread-pool of the ejb-container. But, you won’t find any properties in the administrator console. You need to add them yourself. Open the domain.xml of your domain and add the following lines:

     <ejb-container max-pool-size="64" steady-pool-size="7">
        <property name="thread-core-pool-size" value="10"></property>
        <property name="thread-max-pool-size" value="20"></property>
        <property name="thread-queue-capacity" value="25"></property>
        <ejb-timer-service></ejb-timer-service>
      </ejb-container>

Now, rerun your application. You will see that the thread-pool goes up to 10 when you press F5 in the browser without holding it down. It seems to stagnate on 10 although you kinda specified a max-pool-size of 20. When you continuously press F5 you will suddenly see the threads go up to 20 before throwing an java.util.concurrent.RejectedExecutionException. Nice, but what the hell happened?

Let’s dig deeper in the documentation of the thread pools:

thread-core-pool-size: Specifies the number of core threads in the EJB container’s common thread pool. The default value is 16. Great, there we have our number 16. Setting this to 10 oder 100 will change the actual number of threads doing some work.

thread-max-pool-size: Specifies the maximum number of threads in the EJB container’s common thread pool. The default value is 32. Nice, increasing this to 100 will be the maximum number of threads we can use? Yes and no. You have to consider the default-value of thread-queue-capacity.

thread-queue-capacity: Specifies the size of the thread pool queue, which stores new requests if more than thread-core-pool-size threads are running. The default value is the Integer.MAX_VALUE.

Here starts the confusion. Your queue-capacity is way too high. Pressing F5 will never reach MAX_VALUE, so your core-pool-size never change nor scale. You must limit your capacity first before the thread-pool is scaled with maximal max-pool-size threads. In our example, we will scale when we reach 25 waiting requests. It will scale up to 20. When all threads are used in parallel the container throws an exception.

In the past SUN declared correctly: “That is exactly how it is supposed to behave. First the threads grow to coreSize, then the queue is used, then *if* the queue fills up then the number of threads expands from coreSize to maxSize. Hence if you use an unbounded queue the last part never happens. This is all described in the documentation. If you want an unbounded queue but more threads then increase the core size. Otherwise consider whether a bounded queue is more suitable to your needs.”

Some extra information can be found here https://java.net/jira/browse/GLASSFISH-17735 and http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html.