Trust Me, I'm an Architect

Tuesday, February 14, 2017

Disregarding Character Encoding: A Full-Stack Developers Unforgivable Sin

Why are we talking about character encoding?

I know what you're thinking. "This has been covered before." or "Why are you dredging up a history lesson?". It has become clear over the past several years of my career that an astonishing number of developers are either unaware of or indifferent to character encodings and why it is important.

Unfortunately, this isn't just a history lesson. Today, in a full stack developer's world, the topic of character encoding is more important than ever. The need to integrate in-house and vendor services with varying server and client technologies together into a reliable application requires developers to pay close attention to character encoding. Otherwise, you risk some potentially embarrassing production bugs that will cost your team valuable "street cred". The aim of this article is to reach back in the vault and remind everyone why this topic is still important.

In a perfect world, you're the consumer of an enlightened service that encodes everything to UTF-8 and you're decoding everything using the common default of UTF-8, you'll never need to worry. In practice, this is the assumption being made most of the time. But what happens if you're calling a kludgy service that encodes responses using Windows-1250 (similar to ISO-8859-2) due to some unknown setting buried in a misunderstood framework configuration file on some long extinct, unsupported platform? Well, the answer is "absolutely nothing" as long as you are only supporting characters that have the same code points across different character sets. As developers, we need to be prepared to handle these situations when we encounter them.

So to solve this problem, there are some things you need to know. You have probably heard of US-ASCII or UTF-8. Most developers have a general understanding that they are character encodings and the difference between them. Some developers also understand the difference between a character set that defines code points, and a character encoding that specifies how to encode a code point as one or more bytes. If you are interested in learning more about how encoding works, or the history of character encoding, I would recommend reading the following two articles.

What every programmer absolutely, positively needs to know about encodings and character sets to work with text

David C. Zentgraf
http://kunststube.net

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Joel Splosky
http://joelonsoftware.com

For the purposes of demonstration, I will refer to three of the most commonly used character encodings, US-ASCII, ISO-8859-1, and UTF-8 in this article, but the ideas presented are relevant to all existing character encodings.

Being generally aware of character encodings is not enough to prevent us from making frequent, avoidable mistakes. What makes this interesting is that a majority of the time the character encoding used to decode bytes doesn't matter due to the overlapping of the more popular character sets. For example, US-ASCII contains a character set of 128 visible and control characters. ISO-8859-1 has 256 characters and is a super set of US-ASCII with the first 128 code points identical to US-ASCII; it is one of many character sets to share that trait. The Unicode character set incorporates all 256 characters of ISO-8859-1 as the first code page of its 1,112,064 code points. I might be alone in this, but I wish they hadn't made these character sets overlap. Even if they were "off-by-one", it would have become clear to developers that they were decoding using the wrong format and this would already be common knowledge.

To illustrate what I mean, consider the word "HAT". The US-ASCII encoding is simply the numeric value of each character. Consulting an ASCII chart yields the following encoding.

HAT = 01001000 01000001 01010100

Now consider the fact that both ISO-8859-1 and UTF-8 share the same first 128 code points. As it turns out, all three encodings represent "HAT" with the same byte string 01001000 01000001 01010100. So essentially, I can encode "HAT" with any one of these three encodings, and decode it with another and I'll always get the correct result back out. This is the source of much of the confusion surrounding character encodings.

Herein lies the challenge. You may think you have everything correctly configured and it may work for years. Yet suddenly you may be faced with garbled text in your beautiful application. You may have been calling this service for ages and never realized that something was incorrectly configured. The day that this service needs to return the customer name "Günther", we're going to have a problem. When we attempt to decode this in UTF-8, it's going to be displayed as "G�nther". So, who broke the application?

Strategies for avoiding encoding problems

Before I dive into some specific examples, my blanket advice is to be explicit about which encoding format you are using. Admittedly, this is advice that I'm not always good at taking. However, the fact remains that a vast majority of code I've seen uses the platform-default encoding, which is almost always UTF-8. This is fine until we run the code on a platform that doesn't use this as a default or we read a file encoded with some other format.

Web Pages / AJAX

When you load a web page, the browser has to know how the html was encoded. Generally, this encoding is set via a configuration property on the web server that is hosting the web page you loaded. That information is communicated to the browser via the meta tag in html. You've probably seen this:

<html>
   <head>
       <meta charset="UTF-8" />
       ...
   </head>
   ...
</html>

This informs the browser how to interpret the bytes it received from the server. How was it able to read these characters if it didn't know the encoding, you ask? The browser has to "guess", partially decode the bytes until it finds the charset, and then throw it out and start over with the correct charset. I guess it's a good thing that US-ASCII is the de facto standard for the first 128 code points, right?

When a web page makes an AJAX request to a server, it seems logical that the Content-Type header would contain the charset used to encode it. In fact, this is not true. According to the XmlHttpRequest specification3, the encoding used for an AJAX request is always UTF-8. Attempts to override this behavior are supposed be ignored by all browsers if the spec is correctly implemented.

Web Services

When it comes to REST and SOAP service calls made outside of web application, things get a little trickier. REST, isn't really a specification, but a set of practices that make use of the HTTP specification. SOAP on the other hand is a network messaging protocol specification that is commonly transferred via HTTP, but can also use many other transports such as JMS or SMTP. For REST, arguably the "most correct" way to specify charsets is using the Content-Type header for denoting payload encoding, and the Accepts-Charset header for denoting the desired response payload encoding.

However, we are generally at the mercy of the toolset we are using for the actual implementation of this, particularly when dealing with SOAP. For example, Apache CXF4, a commonly used SOAP framework, allows you to configure the encoding charsets for the HTTP transport globally via configuration file, or by overriding it per endpoint in the WSDL. My advice is to read the documentation thoroughly when configuring one of these frameworks.

Database / JDBC

I've seen some vague tips in the past on dealing with database encoding configurations. My advice is to ignore them entirely. Generally speaking, the encoding used internally by the database is entirely encapsulated within the database driver and the database implementation itself. If you look at the JDBC spec5 as an example, either the source encoding is documented in the API, or it is stored as raw bytes in whatever encoding the client used. For example:

PreparedStatement.html#setAsciiStream specifies that the encoded bytes in the InputStream must be ASCII
PreparedStatement.html#setCharacterStream takes in a Reader instance that must contain Unicode characters (i.e. InputStreamReaders would need to properly decode the source stream)
PreparedStatement.html#setBinaryStream takes an InputStream, but the resulting bytes are stored in a VARBINARY or LONGVARBINARY where the raw bytes are stored and no encoding steps take place

JMS

The JMS specification describes a messaging system, so it would seem a natural consequence that encoding of those messages becomes an important configuration. In reality, the JMS service providers handle all of the character conversion coming in and out of the JMS implementations. There are obviously going to be some considerations here if you are forwarding from one provider implementation to another using some sort of JMS bridging concept, but those are generally configurable. It is possible to create BytesMessage instances where you have complete control over how those bytes are encoded on the producer side and decoded on the consumer side, but this is an application-level decision and is not a configuration consideration.

How do we repair the damage?

There is no easy canned solution to encoding problems. It can be very difficult to figure out what is causing your issues after the damage has been done. Without knowing exactly what is causing the issue, it is next to impossible to repair the damage upon decoding as mentioned in David Zentgraf's article (see section "My document doesn't make sense in any encoding!"). The approach for finding the cause depends on where the issue manifests itself.

Text Editor

The simplest case is opening a file in a text editor. Many editors offer the ability to read and write using a variety of character encodings. For instance Notepad++, SublimeText, and Vim, just to name a few, all support multiple encodings. Set a default that makes sense for you, though this is generally going to be UTF-8. If possible, when you open a file be aware of the encoding used. Failing that, if it doesn't look right, it's easy enough to click through the usual suspects to find the right encoding. If it doesn't look right in any encoding, the file has likely been irreparably broken by some other process.

IDE

An IDE is really just a fancy text editor, so they also allow for default encodings to be set when saving files. Most compilers also have similar settings that allow for source file encoding properties to be set. If you build a test file in an IDE and read it in during a test, i.e. JUnit, you'll want to decode that file using the correct character set. If it isn't the platform default, make sure you explicitly set the value.

Web Pages

If you see gobbledygook on a web page, things start to get a little more difficult. If it's static html, check the encoding of the source html and the meta charset tag on the page. Easy, right? It gets a little harder if it's a single-page web application that calls some number of services, which in turn call a set of 3rd party services and so on. Or maybe you're in a microservice environment where you have service composers that call dozens of different focused services. At that point, my advice is to start from the endpoint and trace the data backwards. Use Postman, SoapUI, or a host of other tools to make the service calls directly. Once you've located the culprit that mangles the text, you can start checking configurations and file encodings to figure out what happened.

3rd Party Service

What if, as in the previous example, you find out that the culprit was a 3rd party vendor service? Your best bet is to start by reaching out to them to ask about the configuration. What encoding do you use for your responses? In a perfect world, they'll say "Oh, that's on page 12 of our documentation. We always use ISO-8859-1" or maybe "Oh, that's configurable… just send us this header". Then you configure your client to match and, voila, problem solved. Just as common you might get a response that they don't know, think they don't have control over that, or say they'll "get back to you". If the answer is at all nebulous, you can try some of the usual suspects and see if the responses are decoded correctly. Ultimately, you may be at the mercy of your vendor.

Creativity For The Win!

I want to share with you a fun experience related to character encodings I recently encountered at a client that wasn't caused by configurations. In this scenario, we noticed that users with non-ASCII characters in their names were not being displayed properly on the website when they logged in. Our first guess was a configuration error, so we started unwinding the application as I mentioned in the previous section. As we worked our way back we found that this data was coming from a cookie. The application called a service that called another, much older service that fetched user profile data and used a Set-Cookie header to add this cookie.

Naturally, we assumed the problem was a configuration on this older service. We started looking at ways to configure the service correctly. However, as we dug deeper, we found the following exception in the logs:

java.lang.IllegalArgumentException: Control character in cookie value or attribute.

Control characters, what is this nonsense? To figure out what was going on, my colleague started reading ancient specs on Cookies. I use the term "spec" very loosely here. The original version of the Cookie spec was written back in 1997 and left out a lot of important details. The result is that every browser and server was left to their own devices when implementing cookie handling.

In the most recent attempt to clean up this old spec in 2011 (see RFC62656), one important historical footnote was carried forward. Specifically, cookie values could contain only US-ASCII characters. Actually, it is a subset of the US-ASCII charset excluding control characters, double quote, comma, semicolon and backslash.

Then we started to question, "Why did it break now when it's been working for years?". We looked into the browsers and found that they generally support more than just the specified visible US-ASCII characters, so that didn't seem to be the problem. Then it occurred to us that this old service application had recently been ported from an old version of IBM WebSphere to a recent version of JBoss. As it turns out, most application servers do the same thing the browsers do; they support non-ASCII characters in the Set-Cookie headers sent to the browser. However JBoss is famous (notorious?) for following specifications very closely and throwing exceptions when invalid values are detected.

Knowing what caused the problem was only half the battle. We needed to figure out how to get around this seemingly pointless limitation. We considered using base-64 encoding which would force the string inside the sub-ASCII limitation in the spec. However decoding base64 in javascript to display would have been somewhat problematic. Instead, we decided to use URL encoding to solve the problem. When adding the Set-Cookie header, we URL-encoded the user's name, which turns "Günther" into "G%C3%BCnther"; these are all valid characters! Then in the javascript we could use window.decodeURI on the cookie value to get back the original non-ASCII user name.

Conclusion

So how do we avoid all of this? David Zentgraf said it best:

"It's really simple: Know what encoding a certain piece of text [is in]".

I often hear people say that everyone should just use UTF-8 since it is a superior encoding format. As English speakers, it is natural to gravitate toward UTF-8 since it generally is able to encode text using fewer bytes than any of the competitors that are able to encode the entire Unicode character set. However this may not be the correct stance in an increasingly global world. Consider the fact that the UTF-8 variable length encoding comes at a cost in complexity. Most Chinese characters are 3 bytes in UTF-8 while they are only 2 bytes in UTF-16 due to the need for control bits in the encoding.

My intent is not to preach the benefits of any particular format. Rather I believe we as developers need to understand how these encodings work and be explicit in our dealings with them. Future developers will thank you for not sending them down this particular rabbit hole when things don't work.

External Links:

The Importance of Not Knowing

As I have progressed throughout my software career I have noticed a trend that proliferates through many aspects of my craft. Specifically, that is the importance of not knowing every detail. For instance, consider the power in producing a useful abstraction. The power is derived not from the details you share, but the details you don't have to share. Since the word "abstraction" has many definitions, I will offer mine for the sake of clarity.

Abstraction

The calculated, contextually subjective selection of relevant detail to reduce the complexity of some subject for facilitating simplification of thought and communication about that subject, or the product thereof.

That is a lot to take in, however I'm guessing this is not a new concept for most. What I find interesting is that the inherent value of any given abstraction changes depending upon its applicable purpose. For instance, Motor Trends™ magazine presents awards to car makers within a given category like "Compact Cars" or "Minivans". However, it's probably easier to teach your child how to sort their Hot Wheels™ by color or size instead of by Motor Trends category.

When I started to transition into a Software Architect role, I thought about this idea a lot. In fact, in one of my performance review self evaluations, I referred to the ability to know the unknowable as an "Architectural Spidey Sense", a quality I desperately craved. The problem was that I ultimately did not know what steps I needed to take in order to grow that elusive talent. Thus began my quest to achieve this lofty goal.

I was recently perusing the excellent book "97 Things Every Software Architect Should Know" and read the essay titled "There Is No One-Size-Fits-All Solution"¹. In this essay, Stafford discusses the term "contextual sense" drawing from the 1991 book by Eberhardt Rechtin². To summarize, contextual sense describes the application of experience to solve a difficult problem. The more experience and wisdom you have at your disposal, the more likely you are to navigate to an optimal solution. Truly it is common sense to those that have the required experience, but is indistinguishable from witchcraft for those who are lacking it.

It immediately struck me; I had finally found a proper term to add to my vocabulary to replace the primitive "Spidey Sense"! Once I had supplied a name and meaning to this vague idea, I needed to identify a path to achieving it. I discovered a potential answer in the video "Software Architecture Fundamentals: Understanding the Basics"³, available from O'Reilly Media. In the section entitled "Architecture Soft Skills Part 2", Mark Richards shares his idea of the Knowledge Pyramid. Neal Ford has blogged about this concept on his own website⁴ and I encourage you to take a look. In that article he explains Mark's concept of the pyramid and how it relates to technical knowledge and experience.

For me the most revealing aspect of this Knowledge Pyramid was the shifting of my own priorities as a professional. For the first 10 years of my career, I was heavily focused on increasing my technical depth, as defined by Mark. In other words, I was trying to actively engage in and have working knowledge of a wealth of different techniques, technologies, and frameworks. However, the amount of technical depth you can acquire is finite, as knowledge grows stale for all but a few extraordinary individuals. As a result, the growth of my technical breadth beyond that depth was organic, bordering on accidental.

My increasing interest in discovering new ideas or technologies over the past few years was exactly what I needed to break out of that pattern and grow my technical breadth. I don't need to know every detail about a framework in order for that knowledge to serve a purpose. All I need to know is what problem it solves and how it's solution is different, or better, than others. With the rapid rate of change in our industry, being able to consume and retain a high-level understanding of new technologies is paramount. This is the most important abstraction I have devised thus far in my career.

The definition of contextual sense can be deceptively simple because it covers the entire breadth of applicable experience. The Principle of Linguistic Relativity⁵ indicates that our thoughts are defined or shaped by, to some extent, the language in which we speak. Having a better definition for a pivotal concept or set of concepts can help to better define a universe of discourse. Certainly, being able to quickly refer to an entire set of principles and ideas in two simple words is a powerful abstraction. I hope that sharing my meandering path to understanding will provide some value to others.

Sources:

¹ Stafford, Randy. "There Is No One-Size-Fits-All Solution." 97 Things Every Software Architect Should Know. Ed. Richard Monson-Haefel. 1st ed. Sebastopol: O'Reilly Media, 2009. 24-25. Print.

² Rechtin, Eberhardt. Systems architecting: creating and building complex systems. Englewood Cliffs, N.J: Prentice Hall, 1991. Print.

³ Software Architecture Fundamentals: Understanding the Basics. Perf. Neal Ford and Mark Richards. O'Reilly Media, 2014. Presentation. O'Reilly Media, Mar. 2014. Web. Feb. 2017.

⁴ Ford, Neal. "Knowledge Breadth Versus Depth." Nealford.com. N.p., 08 Sept. 2015. Web. 12 Feb. 2017. <http://nealford.com/memeagora/2015/09/08/knowledge-breadth-versus-depth.html>.

⁵ Wikipedia contributors. "Linguistic relativity." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 28 Jan. 2017. Web. 12 Feb. 2017.

Thursday, October 8, 2015

EJB vs. Spring: Simple Soap Services

This is the second article in the EJB3 vs Spring series. I will be focused on the differences between Spring and EJB3 when exposing a simple SOAP service. In case you missed it, there is an explanation of my maven project layout in my first article, EJB3 vs Spring: Rest Services. As always, all of the source code is available for your perusal on github at https://github.com/jgitter/fwc. In order to get the most out of this article, you'll want to be familiar with SOAP and JAX-WS. Two good places to get started are http://www.soapuser.com/basics1.html and http://docs.oracle.com/javase/7/docs/technotes/guides/xml/jax-ws/.

If you wish to test out any of the services mentioned for yourself, there are two ways to do so. Simply clone the git repository, run a maven build, and deploy it to a EJB3-enabled container of your choice (some assembly may be required for your container of choice). I have supplied a web interface which can be reached at http:///fwc where you can run all 4 versions of the findInstitutions service and examine the results and I have also committed a SoapUI (5.0.0) project in the parent module that has a request defined for each.

Note that I am using WSDL version 1.1, not the newer 1.2/2.0 due to the general lack of adoption and tool support.

Contract-First or Code-First?

I was hoping to avoid this question when I undertook writing this article since I'm not interested in taking part in a holy war. I wanted to use a code-first approach for my examples simply because I am more comfortable defining my service contract in code. There are several tools available for generating a wsdl from service code, with Apache CXF likely emerging as the best of those tools, though it does much more than that. I completed the JEE version of this service using this approach quite effectively. Then I turned my attention to Spring-WS only to be greeted with the following statement in their reference documentation (http://docs.spring.io/spring-ws/docs/2.2.0.RELEASE/reference/htmlsingle/#why-contract-first):

When creating Web services, there are two development styles: Contract Last and Contract First. When using a contract-last approach, you start with the Java code, and let the Web service contract (WSDL, see sidebar) be generated from that. When using contract-first, you start with the WSDL contract, and use Java to implement said contract.

Spring-WS only supports the contract-first development style, and this section explains why.

This seems like an odd place for Spring to choose to take a stance on how something must be done, when from what I've seen they are usually aggressively flexible. However, let it be known that in my examples I used a code-first approach for the JEE implementation. I then lifted the generated schema in that wsdl to generate the Spring wsdl so I could come as close to a code-first approach as possible. While not the most correct of approaches, I believed I was able to achieve my goal.

After I was finished, it was brought to my attention that I wasn't truly comparing apples to apples. To that end, in my github project I have shown how to configure a maven project to use the CXF codegen plugin to build a service implementation from a static wsdl. If you run 'mvn generate-sources' on the fwc-ejb-web module, the generated classes will appear in the target/generated-sources folder. You would then update the service implementation class to add your implementation to any service methods that were stubbed out there. On the flip-side of that coin, I have also set up the cxf java2wsdl plugin to show an example of how you can generate a wsdl from an implementation. I'm not using the results of either of these, but hopefully they can be educational.

Taking advantage of JBoss

I'm going to be taking advantage of some of the features offered by the JBoss Application Server so, as promised, I intend to highlight those features to prevent skewing the analysis toward one framework or the other. Specifically, in my EJB3 annotated SOAP services, I'm letting the container generate my wsdl for me. It does this by using the aforementioned Apache CXF under the hood.

Alternatively, it is possible to generate your own wsdl and host it. As I mentioned earlier, my project in github shows an example of how this would be done in the plugin configuration for the fwc-ejb-web module. During the generate-resources phase it generates a wsdl from a service implementation and places it in the generated-resources directory. Then, during the compile phase you could copy that wsdl into the WEB-INF directory via the maven resources plugin. It can be referenced from there in the wsdlLocation attribute of the @WebService annotation on the service implementation. I left the sample generation there for instructive purposes, but I'm not using it.

Another way JBoss is also helping me out by rewriting the soap:address in the wsdl when it is requested. This is done by setting to true in the jboss configuration under the webservices subsystem. The Spring container is also doing this for me, but it requires a little more work, which I'll describe in detail.

EJB3 / JAX-WS SOAP Services

For a SOAP service created with JAX-WS annotations, no real configuration is necessary. For my JBoss EAP container, I only need to have the webservices sub-system enabled which uses Apache CXF to publish your service endpoints. Your classpath will be scanned for any classes annotated with @WebService and, in my case, also generates the wsdl as I mentioned above. Here is my service implementation.

SOAP Service Implementation

//snip - not all imports shown here
import javax.ejb.Stateless;
import javax.inject.Inject;
import javax.jws.WebMethod;
import javax.jws.WebParam;
import javax.jws.WebResult;
import javax.jws.WebService;
 
@Stateless
@WebService(name = "InstitutionService", serviceName = "soap-institution",
        targetNamespace = InstitutionSoapService.TARGET_NAMESPACE)
public class InstitutionSoapServiceImpl implements InstitutionSoapService {
 
    @Inject
    private InstitutionService service;
 
    @Override
    @WebMethod
    @WebResult(targetNamespace = InstitutionSoapService.TARGET_NAMESPACE,
            name = FindInstitutionsResponse.NAME)
    public FindInstitutionsResponse findInstitutions(
            @WebParam(
                    targetNamespace = InstitutionSoapService.TARGET_NAMESPACE,
                    name = FindInstitutionsRequest.NAME)
                    FindInstitutionsRequest request) {
 
        FindInstitutionsResponse response = new FindInstitutionsResponse();
 
        // snip
 
        return response;
    }
}

A breakdown of the annotations:

The @Stateless annotation marks this as a stateless session bean
The @WebService annotation is required on any Service Endpoint Implementation or endpoint interface. It informs the container that it is a JAX-WS endpoint.
- The 'name' attribute specifies the name of the web service and is used as the name of the portType in the wsdl.
- The 'serviceName' attribute specifies the service name of this web service and is used as the name of the wsdl:service in the wsdl. It is also used as the url prefix to access methods defined on this service: //fwc-ejb/soap-institution/InstitutionService
- The 'targetNamespace' attribute specifies the namespace of the service which must match the namespace on any incoming requests
The @Inject annotation comes from CDI and injects a @RequestScoped service bean for me as discussed in my Rest Service article and is not relevant to the subject at hand
The @WebMethod annotation marks a method to be exposed on a @WebService endpoint.
The @WebResult annotation describes the SOAP response, which in this case is the FindInstitutionsResponse
- The 'name' attribute specifies the name of the xml element for the response object
- The 'targetNamespace' attribute specifies the namespace of the xml element for the response object
The @WebParam annotation describes the expected SOAP request, which will be matched against any incoming requests on this endpoint. The 'name' and 'targetNamespace' attributes work exactly the same for this annotation as they do for the @WebResult annotation. If the incoming request does not contain the exact same elements and namespaces, the request will be rejected.

There is slightly more to it yet, as I still need to define my request and response objects. If you're looking for them in the code, you'll find them in the fwc-common module. When a request comes in, the XML request will be unmarshaled to the FindInstitutionsRequest object, and the response XML is obtained by marshaling the FindInstitutionsResponse back to XML. The names and namespaces of the elements is specified via JAXB annotations. They are fairly self-explanatory so I won't go into detail describing them.

Transfer Objects

@XmlRootElement(namespace = "http://fwc.gitter.org/services/")
@XmlType(name = "findInstitutionsRequest")
public class FindInstitutionsRequest {
    public static final String NAME = "findInstitutionsRequest";
 
    private String keyword;
 
    @XmlElement(name = "keyword")
    public String getKeyword() {
        return keyword;
    }
 
    public void setKeyword(String keyword) {
        this.keyword = keyword;
    }
}

//----------------------------------------------------------------

@XmlRootElement(namespace = "http://fwc.gitter.org/services/")
@XmlType(name = "institutionList")
public class FindInstitutionsResponse {
    public static final String NAME = "institutionList";
 
    private List institutions;
 
    @XmlElement(name = "institution")
    public List getInstitutions() {
        return institutions;
    }
 
    public void setInstitutions(List institutions) {
        this.institutions = institutions;
    }
}
 
//----------------------------------------------------------------
 
public class Institution {
    private Map<String, String=""> institutionData;
    @XmlElement
    public String getAddress() {
        // I'm cheating - normally you wouldn't do this in a transfer object,
        // but I wanted the SOAP response to be aesthetically pleasing and
        // haven't implemented a domain layer.
        return new StringBuilder()
                .append(institutionData.get("ADDR")).append(" ")
                .append(institutionData.get("CITY")).append(", ")
                .append(institutionData.get("STABBR")).append(" ")
                .append(institutionData.get("ZIP")).toString();
    }
    @XmlTransient
    public Map<String, String=""> getInstitutionData() {
        return institutionData;
    }
    @XmlElement
    public String getInstitutionName() {
        return institutionData.get("INSTNM");
    }
    public void setInstitutionData(Map<String, String=""> institutionData) {
        this.institutionData = institutionData;
    }
}

As an example, if I send this request:

SOAP Request

<!-- HTTP headers omitted -->
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <fwc:findInstitutions xmlns:fwc="http://fwc.gitter.org/services/">
            <fwc:findInstitutionsRequest>
                <keyword>Milwaukee WI</keyword>
            </fwc:findInstitutionsRequest>
        </fwc:findInstitutions>
    </soap:Body>
</soap:Envelope>

I should receive a response like this:

SOAP Response

<!-- HTTP headers omitted -->
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <ns2:findInstitutionsResponse xmlns:ns2="http://fwc.gitter.org/services/">
      <ns2:institutionList>
        <institution>
          <address>500 Silverspring Rd Ste K340 Glendale, WI 53217</address>
          <institutionName>Bryant & Stratton College-Bayshore</institutionName>
        </institution>
        <institution>
          <address>10950 W Potter Road Wauwatosa, WI 53226</address>
          <institutionName>Bryant & Stratton College-Wauwatosa</institutionName>
        </institution>
        <institution>
          <address>4425 N Port Washington Rd Glendale, WI 53212</address>
          <institutionName>Columbia College of Nursing</institutionName>
        </institution>
        <!-- snip ... -->
      </ns2:institutionList>
    </ns2:findInstitutionsResponse>
  </soap:Body>
</soap:Envelope>

If you have deployed the application, the wsdl will be visible at URL http:///fwc-ejb/soap-institution/InstitutionService?wsdl. If you look at the soap:address of the wsdl:service, you'll notice what I was mentioning earlier. That address will be rewritten by the JBoss webservices subsystem to point at the exposed host and port binding for your server. This allows you to use the same WSDL for multiple servers, for example in a clustered environment, without having to know where it will be deployed at compile time.

Spring SOAP Services with Spring-WS

For this example, I'm using spring-ws (2.2.0). As opposed to the EJB3 and JAX-WS method, which is ultimately published by Apache CXF in JBoss, Spring SOAP Services are published by the Spring Container. This also means that the JBoss webservices subsystem is completely unaware of my deployed Spring-WS soap services. In other words, I can't depend on JBoss to generate my wsdls, or rewrite the soap:address for me as above automagically. The first thing we have to do is add a mapping for the Soap requests to the spring-ws dispatcher servlet in the web.xml deployment descriptor.

web.xml

<web-app>
    <!-- snip -->
    <servlet>
        <servlet-name>webservices</servlet-name>
        <servlet-class>
            org.springframework.ws.transport.http.MessageDispatcherServlet
        </servlet-class>
        <init-param>
            <param-name>transformWsdlLocations</param-name>
            <param-value>true</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>
    <servlet-mapping>
        <servlet-name>webservices</servlet-name>
        <url-pattern>/soap/*</url-pattern>
    </servlet-mapping>
</web-app>

The transformWsdlLocations parameter informs Spring-WS that we wish for the soap:address to be rewritten to the current binding address. This performs the same job for us as the JBoss webservices subsystem as laid out above. In addition to mapping the SpringWS dispatcher servlet, we will have to configure the application context.

applicationContext.xml

<beans>
    <!-- snip -->
    <sws:annotation-driven />
 
    <sws:dynamic-wsdl id="InstitutionService"
        portTypeName="InstitutionService"
        serviceName="InstitutionService"
        locationUri="/soap/institution/InstitutionService">
        <sws:xsd location="/schemas/InstitutionService.xsd" />
    </sws:dynamic-wsdl>
</beans>

The allows me to configure my endpoint via annotations. In addition to this, I need to setup the wsdl generation via the configuration. This is where you configure the name for the portType and serviceName as well as the endpoint URL. I also had to configure the static location to a schema document that is used to generate the wsdl. I built this schema document myself rather than generating it, but there are numerous tools you can use to accomplish this. In fact, I cheated and simply copied the schema that was produced by CXF from my EJB3 example. There is a warning in the Spring-WS documentation about the use of the dynamic-wsdl configuration. Since the generation of wsdls from one version of Spring to the next is not guaranteed to be the same, they recommend that you only use it in development. Once you're ready to release, they hint that you should copy the generated wsdl and place it in your project to use with the static-wsdl configuration to prevent unintentional changes to your wsdl contract.

That is enough to configure a simple web service. The rest of the configuration is in the service implementation itself.

SOAP Service Implementation

/* snip - not all imports shown */
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.ws.server.endpoint.annotation.Endpoint;
import org.springframework.ws.server.endpoint.annotation.PayloadRoot;
import org.springframework.ws.server.endpoint.annotation.RequestPayload;
import org.springframework.ws.server.endpoint.annotation.ResponsePayload;
 
@Endpoint
public class InstitutionSoapServiceImpl implements InstitutionSoapService {
 
    private InstitutionService service;
 
    @Autowired
    public InstitutionSoapServiceImpl(InstitutionService service) {
        this.service = service;
    }
 
    @Override
    @PayloadRoot(localPart = "findInstitutionsRequest",
            namespace = InstitutionSoapService.TARGET_NAMESPACE)
    public @ResponsePayload FindInstitutionsResponse findInstitutions(
            @RequestPayload FindInstitutionsRequest request) {
        FindInstitutionsResponse response = new FindInstitutionsResponse();
 
        /* snip */
 
        return response;
    }
}

Here is a breakdown of the annotations:

The @Endpoint annotation acts similarly to the @Component annotation, marking this class a service endpoint implementation and is picked up thanks to the sws:annotation-driven configuration element. It is analagous to the @WebService annotation.
The @Autowired annotation configures the injection point for the context InstitutionService bean
The @PayloadRoot annotation is the counterpart to the JAX-WS @WebMethod annotation. Similarly, it sets up the namespace and name of the expected type for a request.
The @ResponsePayload annotation indicates that the method's return value should be bound to the response payload.
The @RequestPayload annotation indicates that a method parameter should be bound to the request payload.

As you may have noticed, this service uses the same transfer objects as it's EJB3/JAX-WS cousin. It is therefore using the same JAXB configuration for marshaling the request and response payloads. This, then, is all that is needed. Here is an example request and response for this Spring-WS service.

Request

<!-- HTTP Headers omitted -->
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <fwc:findInstitutionsRequest
                xmlns:fwc="http://fwc.gitter.org/services/">
            <fwc:keyword>Milwaukee WI</fwc:keyword>
        </fwc:findInstitutionsRequest>
    </soap:Body>
</soap:Envelope>

Response

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP-ENV:Header/>
  <SOAP-ENV:Body>
    <ns3:findInstitutionsResponse xmlns:ns3="http://fwc.gitter.org/services/" xmlns="">
      <institution>
        <address>500 Silverspring Rd Ste K340 Glendale, WI 53217</address>
        <institutionName>Bryant & Stratton College-Bayshore</institutionName>
      </institution>
      <institution>
        <address>10950 W Potter Road Wauwatosa, WI 53226</address>
        <institutionName>Bryant & Stratton College-Wauwatosa</institutionName>
      </institution>
      <institution>
        <address>4425 N Port Washington Rd Glendale, WI 53212</address>
        <institutionName>Columbia College of Nursing</institutionName>
      </institution>
      <!-- Snip ... -->
    </ns3:findInstitutionsResponse>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Analysis

There are some notable things missing from this article, including some of the reasons that would compel you to use SOAP services instead of REST in the first place. This includes the entire WS-* stack: reliable messaging, use of transports other than HTTP, support for signed/encrypted/authenticated service calls, and transactionality of service calls. I decided to break these concepts out into separate articles so I could do them justice. I'm not sure the gap between EJB3/JAX-WS and Spring-WS is wide enough for the configuration of a simple SOAP service to enable us to make a decision on this alone. However, I did notice a couple things that are at least worth pointing out.

First, I have a confession. When I initially wrote the code for the Spring-WS service, I spent several frustrated hours trying to hammer out a working service, specifically when trying to write up a service implementation that would work with the WSDL that I had cobbled together. My frustration was really due to two things. First, I chose not to do "Contract-First" service development, which would have, more or less, guaranteed that the implementation would have matched the WSDL. Second, my lack of experience with Spring-WS configuration had me battling myself for the first few hours. When I finished getting the service to work, I went through several more passes to understand how the moving parts of Spring-WS were working together and slowly trimmed my configuration down to just the pieces that were necessary to make it work, discovering that some of what I had done was unnecessary or redundant. So hopefully my analysis remains untainted from my early frustrations.

I noticed that configuring the URL of the endpoint in Spring appeared to be more flexible. This may differ slightly when using other containers, but the endpoint of my EJB3/JAX-WS service was /// or /fwc-ejb/soap-institution/InstitutionService. Initially, my serviceName was soap/Institution, but that is not a legal name for a wsdl:service in the WSDL. I got away with it in JBoss, but it would have failed validation, so I changed it. On the other hand, all the Spring SOAP calls were getting routed through the Spring-WS MessageDispatcherServlet which allows any service endpoint mappings you want. I mapped the servlet to "/soap/*" and then in my application context, mapped my wsdl endpoint to "/soap/institution/InstitutionService". What seemed to be a point for Spring-WS turned out to be a side-effect of my choice to use "Contract-Last" development, so it wasn't enough to sway me. If I had generated the wsdl myself, I could have set the endpoint to whatever I wanted.

So which is better? There really weren't any notable differences in the two versions of the implementation. While Spring-WS is certainly a viable option, I was very put off by the fact that my desired Code-First approach wasn't supported. This alone might make me choose to go with JEE. In fact, if you choose to use Spring for other reasons, it is possible to plug in Apache CXF to publish your Soap Services and host the JAX-WS service implementation in the first example via CXF in the Spring container. Honestly, I wasn't expecting to notice any great divergence yet, but I anticipate that I will begin to see some major differences as I delve deeper down the rabbit hole.

EJB3 vs. Spring: Rest Services

Which framework is better?

If you anyone has ever asked you if you prefer pizza or ice cream, you probably answered “Well… it depends”. While frustrating, this is often the correct answer when asking which software framework is “better”. So it is when comparing the Spring Framework with the capabilities provided by an EJB3 container. I have decided to start exploring this question myself, in particular the functionality where EJB3 and the various Spring modules overlap.

Clearly this is not a small task, particularly since I am not intimately familiar with all the possibilities offered by the Spring framework. To ease my pain, I am going to break this analysis up into a series of posts, each building upon the last. It is unlikely that any of these posts will be truly exhaustive in content, but I will do my best to represent the “80%”.

Getting started

I will keep my code as platform independent as I am able. Knowing how minor nuances in containers can affect the behavior of deployed Enterprise Applications, I will divulge that I am deploying my application to JBoss EAP 6.3 for testing. If I do happen to include any JBoss proprietary deployment descriptors or configurations, I will call them out and explain the approach.
The Framework Comparison Application (FWC) is a maven project organized into several modules. All of the source code is available for your perusal on github at https://github.com/jgitter/fwc. The modules are as follows:

fwc: The parent POM module
fwc-ear: The EAR enterprise application module
fwc-common: Java module that contains common code such as interfaces, exceptions and transfer objects
fwc-dao: EJB module containing the DAO layer, accessible to all web applications deployed to the EAR
fwc-web: WAR module that contains the front-end application used for illustration
fwc-ejb-web: WAR module that contains the EJB services. The intent is to encapsulate the EJB3-specific implementations for the purposes of comparison
- fwc-ejb-services: EJB module that implements the actual EJB services
fwc-spring-web: WAR module that contains the Spring services and Spring framework libs. The intent is to encapsulate the Spring-specific implementations for the purposes of comparison
- fwc-spring-services: Java module that implements the actual Spring services

EJB3 and Spring Configuration

One of the myths that seems to persist is the idea that EJB3 uses primarily annotation-based configuration and is therefore, somehow, superior to Spring which uses primarily XML-based configuration. There are two fallacies in this statement that I would like to overcome.

The first fallacy is that annotations are superior to XML for configuration, or vice versa. In fact, it depends highly upon how they are being used, what they are configuring, and ultimately, some level of aesthetics. For instance, I personally think that transaction management, cross-cutting concerns, or web service endpoint configuration is perfect fit for configuration via annotations. It provides clear definition of behavior, is self-documenting, and easy to change. Furthermore, these types of configurations should not be subject to change lightly and may require a redesign of a system, or part of a system. On the other hand, I believe JPA mapping configuration is a great fit for XML configuration. It keeps all traces of your data sources out of your code allowing you to cleanly separate the two distinct activities of data mapping and data access.

Second, it isn’t true that either EJB3 or Spring lends itself to be primarily one or the other. Both offer options for most configurations to be handled via either XML or annotation. I will attempt to show both options for configuration in my service examples.

Restful EJB3 Service

In order for a Restful JAX-RS service to work, we need to map the javax.ws.rs.core.Application to a URL pattern. There are two ways to achieve this. The first method is via a servlet-mapping definition in your deployment descriptor (web.xml) and should look pretty familiar.

XML Configuration via web.xml:

<web-app>
    <!-- snip -->
    <servlet>
        <servlet-name>javax.ws.rs.core.Application</servlet-name>
        <load-on-startup>1</load-on-startup>
    </servlet>
    <servlet-mapping>
        <servlet-name>javax.ws.rs.core.Application</servlet-name>
        <url-pattern>/rest/*</url-pattern>
    </servlet-mapping>
</web-app>

The second way to achieve the same result is to use the @ApplicationPath annotation on a POJO. This annotation is discovered upon startup and the container will map the Application servlet for you.

Java Configuration via annotation:

import javax.ws.rs.ApplicationPath;
import javax.ws.rs.core.Application;

@ApplicationPath("/rest")
public class FwcApplication extends Application {}

Note that the path is "/rest" and not "/rest/*" as in the web.xml above. These two configurations are equivalent and result in all calls to /<context-root>/rest/* to be routed through the JAX-RS Application servlet.

There are additional options and behaviors that can be configured but I will not cover those here. This is enough to give us basic JAX-RS services. Here is the EJB3 web service implementation:

The EJB3 service implementation:

@Path("/institution")
@Stateless
public class InstitutionRestController {

    @Inject
    private InstitutionService service;

    @GET
    @Path("{keyword}")
    @Produces(MediaType.APPLICATION_JSON)
    public List<Map<String, String>> findInstitutions(
            @PathParam("keyword") String keyword) {
        return service.findInstitutions(keyword);
    }
}

There are a few things happening here.

The @Path annotation provides the relative path to make calls to the exposed methods of this restful service. Notice that annotation is on the class as well as the method itself. For this example, the path to the findInstitutions service method is /<context_root>/fwc-ejb/rest/institution/{keyword} where \{keyword\} is any non-null string containing a search term.
The @Inject is a CDI annotation. Dependency Injection is not the subject of this article, so I will just say that this annotation can be used to inject any CDI bean into a container-managed resource. In this case, InstitutionService is a POJO decorated with the CDI annotation @RequestScoped. See the CDI documentation for more information on the subject.
The @GET annotation informs the JAX-RS implementation that this method can only be accessed via the HTTP GET method.
The @Produces annotation informs the JAX-RS implementation that this method will result in a response that contains JSON data. JBoss will uses Resteasy under the hood, which will transform the returned object to JSON.
The @PathParam annotation informs the JAX-RS implementation that the keyword parameter should be parsed from the URL path wherever the \{keyword\} path is configured.

And that's all there is to it. This is enough to expose a simple Restful web service. I did not mention the @Stateless annotation, which marks this service as a Stateless EJB3 Session Bean. This is not necessary to expose this as a service, but comes with a few benefits. For instance, the container will create a pool of reusable service objects for handling requests, enables injection since the bean is container-managed, and exposes the bean through JNDI for local service calls. Simply adding an @Remote interface would also expose it for remote invocation.

Restful Spring Service

Configuring a container for Spring Web MVC Restful services is very similar to the previous example. The first step is to map the Spring DispatcherServlet, which acts as the top-level controller for your services, much as the Application servlet did for the JAX-RS implementation. This is done via the deployment descriptor (web.xml).

Spring DispatcherServlet mapping:

 <web-app>
    <servlet>
        <servlet-name>dispatcher</servlet-name>
        <servlet-class>
            org.springframework.web.servlet.DispatcherServlet
        </servlet-class>
        <load-on-startup>1</load-on-startup>
    </servlet>

    <servlet-mapping>
        <servlet-name>dispatcher</servlet-name>
        <url-pattern>/rest/*</url-pattern>
    </servlet-mapping>

    <listener>
        <listener-class>
            org.springframework.web.context.ContextLoaderListener
        </listener-class>
    </listener>
</web-app>

This will map all requests to /<context_root>/rest/* through the DispatcherServlet. Unlike the previous example, we aren't done configuring the container here since the Spring container still needs to be made aware of your service controllers. There are a number of ways to do this, however the easiest way to handle this is by updating your the configuration for your context. The ContextLoaderListener will bootstrap your application context upon startup of the container which will look for the applicationContext.xml. You can configure this here.

applicationContext.xml:

<beans>
    <mvc:annotation-driven />
    <context:component-scan base-package="org.gitter.fwc" />
    <!-- snip -->
</beans>

This can be a little deceptive as there is a lot going on here. The <mvc:annotation-driven> tag enables annotation configuration for your Restful Controllers via the @Controller, @RestController, @RequestMapping, and @ExceptionHandler annotations and provides the standard conversion service for serializing your data for transfer. It also sets up the HttpMessageConverter which takes care of your @RequestBody and @ResponseBody annotations. In short, one line turns on all of your Spring Web MVC annotations. Alternatively I could leave this off and configure my entire controller via xml, however I consider that configuration to be somewhat tedious and counter-productive, so I'm not going to show it here.

The <context:component-scan> tag activates all of your Spring beans, as well as scans the package indicated for any bean annotations to be registered automatically, such as @Component, @Service, @Repository, etc. This is a shortcut that allows you to skip the tedious, manual creation of xml configuration for registering simple beans. For example, I don't need to register my InstitutionService.

Not Needed:

<beans>
    <!-- snip -->
    <bean name="institutionService" class="org.gitter.fwc.InstitutionService">
        <property name="dao" ref="institutionDAO" />
    </bean>
</beans>

As promised, here is an annotation configuration alternative to the applicationContext.xml configuration above.

import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.EnableWebMvc;

@Configuration
@EnableWebMvc
@ComponentScan(basePackages = { "org.gitter.fwc" })
public class WebMvcConfiguration {}

Now that everything is configured, we just need a service implementation.

Spring Rest Service Implementation:

import java.util.List;
import java.util.Map;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.ResponseBody;

@Controller
@RequestMapping("/institution")
public class InstitutionRestController {

    @Autowired
    private InstitutionService service;

    @RequestMapping(value = "/{keyword}", method = RequestMethod.GET)
    public @ResponseBody List<Map<String, String>> find(
            @PathVariable("keyword") String keyword) {
        return service.findByKeyword(keyword);
    }
}

Here's the breakdown

The @Controller annotation marks it as a Web MVC Controller and is picked up by the context component scan so it can be registered as a Spring bean.
The @RequestMapping annotation allows us to set a URL mapping to our web methods. This is analogous to the @Path annotation in JAX-RS. In addition, we are mapping the GET method via the method attribute of this annotation, as opposed to the @GET annotation in JAX-RS.
The @Autowired annotation is used to inject one Spring resource into another. Again, Dependency Injection is not the subject of this article, so I won't go into detail. However, my InstitutionService is marked as a @Component so it was picked up by the context component scan and registered as a spring bean, enabling it for auto wired injection.
The @ResponseBody annotation activates the HttpMessageConverter when encountered during a service method call to convert the returned object to an HTTP response. Alternatively, as of Spring 4, you can put @ResponseBody on the class level, or use @RestController instead of @Controller. Either of these will cause the HttpMessageConverter to be invoked for all of the service methods so you don't have to annotate each method.
The @PathVariable annotation is analogous to the @PathParam JAX-RS annotation and simply maps the parameter to a URL path variable named "{keyword}".

Analysis

One topic I did not cover is the customization for handling of request and response objects. However, most JAX-RS implementations and Spring will use 3rd party libraries for marshaling and unmarshaling that data anyway. Suffice to say, it is possible to do this using either method through fairly simple configuration so I'm not going to go into detail for sake of brevity.

Another topic not addressed in the examples is that of bean scope. It can be advantageous to scope a bean's lifecycle to a user's HTTP Session, instead of to the request itself. The ubiquitous example is the shopping cart. You don't want to have to recreate the entire cart each time the user adds an item, nor do you want to have to serialize and pass that entire cart back and forth from server to client or store it on the client. There are many ways to preserve the state of this cart, but one easy way to handle this is through the use of a bean scoped to the HTTP Session. Again, there is no clear advantage. Spring offers the ability to scope your beans via the scope attribute of the bean tag in xml or via the @Scope annotation on your beans, though you must use a web-aware application context such as the XmlWebApplicationContext or ClassPathXmlApplicationContext. While EJB 3 itself does not do this, you can achieve this through the use of the @SessionScoped annotation from CDI on CDI beans which can be injected into your services.

So which is better? I set out to elaborate upon the cliché "it depends" answer, but I have ironically arrived back at the same conclusion. In short, while Spring Web MVC is easy to learn and even fun to use it offers no clear advantage. It is my opinion that you don't choose to build a web application with Spring to get Web MVC. However, if you've already chosen to use Spring and want to add restful web services, Web MVC is probably the way to go. If you're adding Spring to an existing application with restful web services, you may actually be better served to hook the existing framework (e.g. Resteasy or Jersey) into the Spring Web module instead of reconfiguring everything for Web MVC. This idea is discussed in Spring's documentation at http://docs.spring.io.