Deplication: 2011

Tuesday 27 September 2011

Bash script to extract a range of revision diffs

A quick Bash script hack to extract all CVS/SVN diffs from a range of revisions:

for start_ver in {1..25};
do end_ver=`echo "$start_ver + 1" | bc`;
cvs diff -kk -c -r 1.$start_ver -r 1.$end_ver source.file >> diff_output.txt;
done;

This script will incrementally extract all differences between version 1.1 and 1.26 and write them to the diff_output.txt file. Incremental in this context means changes between each revision, so the output will contain the diff between version 1.1 and 1.2 followed by the diff between 1.2 and 1.3, then 1.3 and 1.4 etc.

A few notes (as always):

The script should be run in the directory containing the file the diff is being run for (source.file in the example)
The physical revision numbers can be replaced with tags (assuming sequential numerical naming of the tags)
The diff options (-kk -c) can be changed for the required diff output format
This was a quick hack and can almost certainly be improved

Thursday 8 September 2011

Generating SQL insert commands using SELECT

A quick hack for generating a list of SQL insert commands from an existing data set. Useful for copying specific data between databases/tables.

select 'insert into DestTable values(''' + Value1 + ''',''' + Value2 + ''')' from SourceTable

This query will return a result set in the format:
insert into DestTable values('a','b')
insert into DestTable values('x','y')

A few notes:

The '+' might need to be replaced with a concat function on certain databases.
A convert or cast operation may be required for non-character values
The result set can be limited by adding a where class to the select query

Wednesday 20 July 2011

Forcing XStream to use reflection for serialization

XStream is a fantastic library for serialising objects to XML, this (very short) post explains how to force reflection on classes implementing the Externalizable interface. The code below is the typical usage of XStream to serialise an object

XStream xstream = new XStream();
System.out.println(xstream.toXML(someObject);

Unfortunately if someObject implements Externalizable then the output XML looks something like:

<com.blogspot.deplication.SomeObject>
<boolean>true</boolean>
<int>1234213</int>
<int>20163</int>
<int>0</int>
<int>4211</int>
<int>32321981</int>
<int>1233</int>
<boolean>true</boolean>
</com.blogspot.deplication.SomeObject>

To fix this, the priority of the ReflectionConverter needs to be increased as by default it is the last converter called when trying to serialise the object.

XStream xstream = new XStream();
ReflectionConverter reflectionConverter = new ReflectionConverter(
new CachingMapper(xstream.getMapper()), xstream.getReflectionProvider());
xstream.registerConverter(reflectionConverter, XStream.PRIORITY_LOW);
System.out.println(xstream.toXML(someObject);

This code results in the more friendly XML output:

<com.blogspot.deplication.SomeObject>
<__id>1234213</__id>
<__tradeId>20163</__tradeId>
<__linkedTradeId>0</__linkedTradeId>
<__bookId>4211</__bookId>
<__productId>32321981</__productId>
<__eventConfigId>1233</__eventConfigId>
</com.blogspot.deplication.SomeObject>

Monday 18 July 2011

Java application patch deployment

This post explains a few mechanisms for deploying patches to production Java applications whilst trying to minimise downtime and user impact.

Class path update
The simplest deployment is to create a new JAR file with the patched classes in the correct package structure and add it to the front of the class path used to start the application. So for example if your application was started with the following command:

java -cp application.jar:somelib.jar com.blogspot.deplication.MainClass

You could modify it as below to use the patched versions of your classes in the patch.jar file:

java -cp patch.jar:application.jar:somelib.jar com.blogspot.deplication.MainClass

This is the basic class path functionality inherent to Java and enables the loading of the patch classes from the patch.jar before the application.jar because patch.jar occurs first in the class path. This is the simplest deployment and requires the application to be restarted with the new class path for the patch to take effect.

As the application needs to be restarted for this change to take effect, there is not much of a case for this approach over a full deployment. This approach does however allow you to override classes in one of the other JAR files, which may be useful for deploying a version of a class with additional logging (for example) without the need for a full deployment. Once the log data has been generated the patch file can be removed and the application restarted as its original deployment.

Class file deployment
The second option is to deploy compiled class files to a directory specified in the class path. This approach is almost identical to the class path update method but does not require the command line or class path environment variable to be updated with the new patch JAR file name. So for example, starting the application with the following command:

java -cp .:application.jar:somelib.jar com.blogspot.deplication.MainClass

Would allow new classes to be copied to the correct package structure in the working directory and after restart the patch class would be used. In this case, copying a new MainClass to ./com/blogspot/deplication/ would result in the class in the directory being used instead of the version in the application.jar. Once again, this approach requires a restart to take effect and has the added complexity of keeping track of the individual class files rather than just the patch JAR file.

Dynamic class reloading
The third option is the implementation of a custom class loader to enable
dynamic class reloading. Unfortunately this option requires code changes and imposes various restrictions where the classes being loaded either need to extend a super class or implement an interface.

Interesting points

The class path and class file deployment can be quite useful on distributed systems where only a single VM requires the patched files and the nodes can be stopped and restarted independently.
Remember to cleanup the deployed classes/JARs when doing a full deployment to prevent them from overriding the new code
Solaris (and apparently AIX) may use memory mapped JAR files, expect very strange behaviour if you replace the JAR files in a running VM on these platforms
For configurable applications that use reflection to load classes it is possible to deploy a patched class with a new (unique) name and configure the application to use the new class, thus avoiding the need for a restart
Client applications launched via Java Web Start can be updated by adding the patch jar at the top of the resources section, the main="true" attribute must be excluded for the order of the resources to be used in Java 1.6
Overriding classes using the class path is far from ideal and should be avoided where possible

As always there are almost certainly different approaches, feel free to add comments with what has worked for you.

Saturday 16 July 2011

Bash script to automate report checking

The script below is a convenient hack to automate manual report checking. It requires uuencode and mailx (or equivalents) to be installed and uses the regular expression provided on the command line to check if the occurrences are greater than the specified number. This allows for a number of convenient checks:

Using "$" as the regex and a count of 1 prevents empty reports from being sent. The count can be changed to 2 if the report has a header line even when it has no results.
A specific condition can be matched, for example using a regex of "Match this text" to email the report only if the text appears in the report.

As always, there are multiple ways of solving the problem. This was a quick an convenient script to allow multiple reports to be checked and mailed to different recipients under different conditions.

#!/bin/bash
# Shell script for checking a named (text) report for a number of occurrences of a
# search if the search string is found at least the number of times specified then
# the report is mailed to the specified email list
#
# Usage example:
# This command will check the report.csv report for any entries ($REPORT_DIR needs
# to be set):
# ./check_report.sh report.csv "$" 1 my.email@address "Email Subject"

cd $REPORT_DIR

if [ -e $1 ]; then
COUNT=`grep -c "$2" "$1"`
if [ $COUNT -gt $3 ]; then
uuencode "$1" "$1" | mailx -s "Check Report $5" "$4" > /dev/null
fi
fi

Wednesday 13 July 2011

The importance of choosing the right data structure

A data structure is used for storing and organising data in system. Selecting the correct data structure for a particular problem is an important part of software design and will influence efficiency and maintainability of the code. This post demonstrates the importance of data structure selection with a contrived problem posed in a technical interview and a real world application.

The problem posed is as follows. Imagine you have a list of unsorted positive integers greater than zero. The list can be of arbitrary size but your algorithm must cater for extremely large lists (think millions or hundreds of millions of items). The list consists of items such that the number of occurrences of any particular integer is always even. One (and only one) integer in the list has an odd number of occurrences, how would you find that integer?

So for example, the list consisting of the following integers

5, 73, 8, 24, 24, 5, 73, 1, 24, 24, 5, 1, 8

has 5 as the number occurring an odd number of times (3 in this case but could be any odd number).

This is a great interview question as it allows the interviewer to get a feel for how an applicant approaches a problem and opens up various other branches of investigation. For example the applicants knowledge of algorithm efficiency (Big O notation) and sorting algorithms can be explored.

The key point in solving the problem revolves around how many times each of the items needs to be accessed to find a solution:

Every item needs to be visited at least once, no conclusion can be drawn until the last number is known.
There is no benefit to sorting the items, after sorting the problem is not solved and it will have incurred the additional cost of sorting
Using a map is not efficient for extremely large inputs and still requires an additional step for iterating over each of the unique values to check oddness

The ideal data structure to solve the problem is actually a Set. For each item add it to the set if it is not already present, if it is present the remove it, after processing all the items the set will contain the integer occurring an odd number of times. Ironically (for a post about data structures) there is an even simpler algebraic solution to the problem, performing a bitwise XOR (the ^ operator in Java) on all the input numbers results in the same solution without the use of a data structure. As stated above this example is contrived specifically for use in technical interviews, a real world application of data structure selection is as follows.

A functional requirement for a critical component of the system is that no two threads try and update an object (with a unique identifier) simultaneously. The system is multi-threaded and event driven, with certain events triggering the object updates. Simultaneous updates would result in a version exception and require manual intervention to correct. An initial (optimistic) solution is to add the synchronized keyword to the method responsible for performing the update, thus ensuring sequential execution of updates. While this approach should work as desired it incurs the cost of class level locking and prevents any kind of concurrency, even for update requests to distinct objects which could actually be executed in parallel. This raises the non-functional requirement of the component namely performance.

The component is responsible for cash movements in the system and at peak times can have thousands of events generated every second. While there is no real-time requirement, the delay introduced by the synchronized method could lead to the last events being processed minutes after they were received, a situation that was not acceptable to the business. After closer investigation it was identified that in the worst case only a fraction of the updates would ever attempt simultaneous updates. To restate the problem, how can duplicate requests be blocked while allowing distinct requests to process concurrently?

public class LockTest {
  protected static Set<Integer> ID_LOCK = Collections.synchronizedSet(new TreeSet<Integer>()); 

    public void performUpdate(int id) {
        if (!ID_LOCK.contains(id)) {
            boolean addedNew = ID_LOCK.add(id);
            if (addedNew) {
                try {
                    SomeObject object = loadObject(id);
                    if (!object.isUpdated())) {
                        object.update();
                        object.save();
                    }
                } finally {
                    ID_LOCK.remove(id);
                }
            }
        }
    }

    private SomeObject loadObject(int id) {
        // Load the object (from database/remote call)
        return someObject;
    }
}

The code above is a fair representation of the production code used to solve this problem and works as desired. A few points before looking at the implementation details. The loadObject() and object.update() calls are quite expensive and involve remote calls and over-the-wire object serialisation. It is also entirely possible for the object to be changed without the local copy being updated or notified of the change. I raise these points because they are significant for the (slightly off topic) synchronisation discussion below.

Lets start by looking at the flow for a single threaded (non-duplicate) call to the performUpdate method.

!ID_LOCK.contains(id) condition (line 5) passes as the set is empty
id is added and the addedNew variable is set to true (line 6)
addedNew condition is met and the object is loaded (lines 7 - 9)
!object.isUpdated() condition is met and the object is updated and saved (lines 10 - 12)
id is removed from the set in the finally clause (line 15)

This is the simplest scenario. Extending it to two (or n) concurrent calls to the performUpdate method with distinct IDs follows exactly the same flow just interleaved with the same operations potentially at different stages of execution. So for example, two concurrent calls for id1 and id2 could look as follows:

!ID_LOCK.contains(id1) condition (line 5) passes as the set is empty
id1 is added and the addedNew variable is set to true (line 6)
!ID_LOCK.contains(id2) condition (line 5) passes as the set contains id1
id1 addedNew condition is met and the object is loaded (lines 7 - 9)
id2 is added and the addedNew variable is set to true (line 6)
id2 addedNew condition is met and the object is loaded (lines 7 - 9)
id2 !object.isUpdated() condition is met and the object is updated and saved (lines 10 - 12)
id2 is removed from the set in the finally clause (line 15)
id1 !object.isUpdated() condition is met and the object is updated and saved (lines 10 - 12)
id1 is removed from the set in the finally clause (line 15)

The really interesting stuff happens when two concurrent calls occur with the same ID. Fortunately the code maintains the critical section condition for all permutations:

Thread 1 is at line 7 (set contains id), thread 2 fails the condition at line 5
Both threads execute line 6, only one of them will return true , the other thread will fail at line 7
Thread 1 is at line 16 (id has been removed from the set), thread 2 will fail on line 10 (object already updated)

A few notes about the implementation:

The try/finally block is very important to ensure the lock is always released
The object.isUpdated() check is only required if the update method can't be called on an object that has already been updated
The synchronisation of the set could probably be optimised (instead of using the Collections synchronisation) but it would complicate the implementation
TreeSet is used for the ordering of the values, this is not strictly needed as the underlying implementation on a Set is generally a Map which will have a similar complexity for the look up as that of an insert into a SortedMap
As always, this is just one solution to the problem. I am sure others exist, please feel free to comment if you have a more elegant solution

As you may have guessed (if you made it this far in the post), I am quite a fan of the Sets and lament the fact that they are not used more widely, particularly when they are well suited to solving a problem.

Tuesday 12 July 2011

Handling raw type and type safety warnings when using legacy code

The addition of generics to Java 5 enabled type checking at compile time. This assists in preventing a ClassCastException at runtime but to maintain backward compatibility generics also came with Type Erasure:

Type erasure exists so that new code may continue to interface with legacy code. Using a raw type for any other reason is considered bad programming practice and should be avoided whenever possible.

When mixing legacy code with generic code, you may encounter warning messages similar to the following:

Note: WarningDemo.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

Unfortunately for projects with large legacy code bases or dependencies on legacy third party libraries the raw type warnings can be a major irritation if you prefer not to have false-positive warnings. The solution is to add the @SuppressWarnings("rawtypes") annotation which works as expected, telling the compiler to ignore the raw type issue. The problem with this approach (as discussed elsewhere) is that your code ends up having an annotation for each call to the legacy code which in turn reduces readability. Adding @SuppressWarnings({ "unchecked", "rawtypes" }) at the class level will also suppress all the warnings but may unintentionally mask instances that should be flagged as warnings. An alternative approach is provided below.

Lets assume that you have a legacy code method (which you are not able to change) as defined below:

/**
     * Retrieve the list of values
     * @return A vector of string values
     */
    public static Vector getValues();

In your code you would like to use generics to process the returned vector as below:

public static void main(String[] args) {
        Vector<String> safeVector = getValues();
        for (String value : safeVector) {
            System.out.println(value);
        }
    }

Compiling your code gives the "uses unchecked or unsafe operations" warning. Rather than adding the suppress warnings to every line where getValues() is being called you can instead define a wrapper method returning the typed vector as below:

@SuppressWarnings({ "unchecked", "rawtypes" })
    public static <T> Vector<T> castType(Vector v) {
        return (Vector<T>) v;        
    }

This means your code is changed as follows:

public static void main(String[] args) {
        Vector<String> safeVector = castType(getValues());
        for (String value : safeVector) {
            System.out.println(value);
        }
    }

The warning messages are no longer a problem and the code functions as expected. A few notes on the implementation:

The type T is inferred from the Vector<String> declaration, the castType method would work equally well for an integer vector (Vector<Integer>) .
The warning is merely being suppressed, the castType method is not actually checking the type of the vector elements. A runtime exception will occur when accessing an element if the vector passed to the method is not actually of the declared type.
The castType method should only be used when you are certain of the type of elements contained in the vector (restating the above point).
The castType implementation is using vectors to demonstrate the concept but can be extended to other types (or even generalised to use the Collection interface)
An argument can be made against the overhead of castType method call instead of a class level @SuppressWarnings annotation, this implementation merely provides an alternative.

For completeness the generalised method would look as follows:

@SuppressWarnings({ "unchecked", "rawtypes" })
    public static <T> Collection<T> castType(Collection v) {
        return (Collection<T>) v;        
    }

Sunday 10 July 2011

Using the EJB3 timer service instead of Thread.sleep()

As with most software development a large amount of effort is spent testing, be it functional or regression. The effort required to coordinate testing increases significantly when integration with external systems is required, particularly if those systems are not part of the same division or organisation. I recently volunteered to assist with reducing this overhead for an interface to our payment gateway, this post uses this implementation to demonstrate the use of the EJB 3 timer service for delaying processing.

The gateway in this system is responsible for routing messages to a payment network and processing asynchronous responses which are then communicated to the originating source system. In this case the source system interface to the gateway is quite simple, using TextMessage objects placed on a JMS queue hosted on JBoss. The response messages from the gateway are simple ACK/NACK messages containing an identifier and are placed on a separate JMS response queue processed by the source system. Unfortunately the test payment network is not always available and is only able to accept payment messages to specific test entities, this limits the variety of test cases and introduces reliability issues in the regression testing. Given the simple interface mechanism it was logical to create a gateway simulator that merely processed the request queue and sent corresponding responses to the response queue. The original implementation code as as follows:

@MessageDriven(activationConfig = {
  @ActivationConfigProperty(propertyName = "destinationType", 
    propertyValue = "javax.jms.Queue"),
  @ActivationConfigProperty(propertyName = "destination", 
    propertyValue = "queue/InQueue")
  //...
  })
@TransactionManagement(value = TransactionManagementType.CONTAINER)
@TransactionAttribute(TransactionAttributeType.REQUIRED)
public class SimulateGatewayMDB implements MessageListener {
    static Logger log = Logger.getLogger(SimulateGatewayMDB.class);

    @Resource(mappedName = "ConnectionFactory")
    private QueueConnectionFactory connectionFactory;

    @Resource(mappedName = "queue/OutQueue")
    private Queue responseQueue;

    public void onMessage(Message message) {
        try {
            String reference = message.getStringProperty("Reference");
            log.info("Simulator received message: " + reference);
            queueReply(reference);
        } catch (Exception e) {
            log.error("Message processing failed.", e);
        }
    }

    protected void queueReply(String sourceReference) throws Exception {
        final String SEPARATOR = ";";
        StringBuffer message = new StringBuffer("GWRESPONSE");
        message.append(SEPARATOR);
        message.append(sourceReference).append(SEPARATOR);
        message.append("ACK").append(SEPARATOR);
        message.append("ACK by SimulateGW").append(SEPARATOR);
        message.append("Response").append(SEPARATOR);
        publishMessage(message.toString());
    }

    private void publishMessage(String message) throws Exception {
        QueueConnection queueConnection =
            connectionFactory.createQueueConnection();
        QueueSession queueSession = queueConnection.createQueueSession(false, 
            Session.AUTO_ACKNOWLEDGE);
        TextMessage textMessage = queueSession.createTextMessage(message);
        QueueSender sender = queueSession.createSender(responseQueue);
        sender.send(textMessage);
        queueConnection.close();
    }
}

This implementation works exactly as expected, placing a response message on the response queue for each request message. Unfortunately it works a bit too well... In the production environment the message is sent to an external network and has a delay between the request being sent and the response being received (hence the asynchronous processing). This delay allows the source system to perform additional processing, updating the message status and moving it along a workflow. The almost instantaneous response from the simulator prevents this processing from occurring which causes some of the test cases to fail.

The obvious solution to this timing issue is to introduce an artificial delay between the request being received and the response being sent. In a standalone application this delay would typically be accomplished by calling Thread.sleep() to delay further processing. The EJB specification is quite clear about not interfering with the container thread management:

"The enterprise bean must not attempt to manage threads. The enterprise bean must not attempt to start, stop, suspend, or resume a thread, or to change a thread’s priority or name. The enterprise bean must not attempt to manage thread groups.

These functions are reserved for the EJB container. Allowing the enterprise bean to manage threads would decrease the container’s ability to properly manage the runtime environment."

It is unclear whether or not Thread.sleep() should be considered a thread management attempt but it is preferable to use the container provided timer service. Fortunately this service is remarkably simple to use:

Add a TimerService resource
Add a @Timeout annotated method as a callback on timer expiry

The updated code (containing the changed sections):

    @Resource
    javax.ejb.TimerService timerService;

    public void onMessage(Message message) {
        try {
            String reference = message.getStringProperty("Reference");
            log.info("Simulator received message: " + reference);
            timerService.createTimer(10000, reference);
        } catch (Exception e) {
            log.error("Message processing failed.", e);
        }
    }

    @Timeout
    public void sendResponse(Timer timer) {
        try {
            log.info("Sending response: " + timer.getInfo());
            queueReply((String) timer.getInfo());
        } catch (Exception e) {
            log.error("Message processing failed.", e);
        }
    }

The changes above introduce a ten second delay between the request being received and the response being sent. A few notes about the implementation:

The createTimer call creates a single action timer which is only triggered once when the specified period (in milliseconds) has passed.
The reference object passed into the createTimer method is a string but can be replaced with any class implementing the Serializable interface
There are various other methods serving different purposes which are provided by the TimerService interface

Andrew Lee Rubinger and Bill Burke's book Enterprise JavaBeans 3.1 is an excellent resource for EJB3 implementation. Additional credit to O'Reilly for providing quality book downloads in a number of DRM-free formats. Thanks to Alex Gorbatchev's SyntaxHighlighter for providing the code formatting script embedded in this page.

First post

The purpose of this blog is to share information I have found to be useful. Some of the information may be available elsewhere on the net but possibly not in the context it is presented here. The idea for this blog was realised (UK spelling) when considering the focus on consuming and sharing existing information rather than producing potentially original content.

Given my focus on technology, and more specifically software development, a lot of the posts are likely to have technical content. There is no shortage of arbitrary sharing of inane daily information on the net and while this can be amusing, for the most part it is merely narcissistic and transient. As with all things, I expect the nature of this blog to change over time, hopefully improving as it changes.