WCM and XML Processing

One thing that we’ve been pretty good at since the launch of Alfresco WCM 18 months ago is our eating of the proverbial dog-food, in that we use the product itself to author, review and push out updates to our web site.

The majority of the site is relatively static in nature and is therefore pre-baked by our templates. Where JSP is used it is mostly for simple tasks such as including common header and footer components and looking up message strings, but there are some areas of the site which do have more dynamic data.

Up until now these areas – including our events, press releases and training pages – have leveraged some relatively simple but custom-built Java libraries that allow data from the underlying XML files to be queried and presented as appropriate, similar to the press release example bundled with the Alfresco example site in WCM.

This approach has worked well, but is a barrier to adding new functionality to the site since extra code must be developed to handle new data types. Whereas data can be queried easily at rendering time using the getXMLDocuments() function exposed to Freemarker and Xalan, this is more difficult to do at request time, requiring either an Alfresco runtime environment on the delivery side or extra code, as above.

I wanted to find a way to allow web developers to easily implement this functionality using JSP and JSTL only and the result has been an XML data processing library that can be leveraged by a JSP page to dynamically pull in and process XML data to assemble the page.

As of yet there is no snazzy name, but suggestions are welcome! The library is a single JAR file, plus a tag descriptor to provide the necessary functions to the JSP.

Under the covers, the library uses functionality from J2SE 5.0’s XPath classes to parse XML content and is capable of loading data from within a standalone servlet container or in the context of the Alfresco virtualisation server, via AVMRemote.

Binary downloads are available as a ZIP containing all the files needed, with the Java source also available.

To install this in an existing webapp, you will need to do the following:

Step 1 – Set up the webapp

Drop alfrescowww-xml.jar into your WEB-INF/lib folder and alf-xml.tld into WEB-INF. You will also need to reference the TLD from within the <web-app> element of web.xml as follows to ensure this gets picked up as appropriate.

<taglib>
<taglib-uri>http://www.alfresco.org/jsp/alf-xml</taglib-uri>
<taglib-location>/WEB-INF/alf-xml.tld</taglib-location>
</taglib>

If you are using JSTL you should also have a section like this, with c.tld and jstl.jar installed as above.

<taglib>
<taglib-uri>http://jakarta.apache.org/taglibs/core</taglib-uri>
<taglib-location>/WEB-INF/c.tld</taglib-location>
</taglib>

Step 2 – Set up the JSP

Add the namespace to your JSP page to allow it to reference the data functions, as follows.

<jsp:root version="1.2"
xmlns:jsp="http://java.sun.com/JSP/Page"
xmlns:c="http://java.sun.com/jsp/jstl/core"
xmlns:alfxml="http://www.alfresco.org/jsp/alf-xml">

This assumes you are using XML syntax within your JSPs and also that you are also using JSTL‘s core components on your page with these fully installed. If not, you will need to modify this code appropriately.

Step 3 -Get XML data

Assuming your webapp starts OK at this stage, you should now be able to use the alfxml:getXMLDocuments() function to pull in XML data from the JSP, using something like the following.

<c:forEach
   items="${alfxml:getXMLDocuments(pageContext, '/media/releases', false, '/alfdotcom:pressrelease', '/alfdotcom:pressrelease/alfdotcom:launch_date', 2)}"
var="pr">
<div class="list-item">
<h3>
<a href="/media/releases/${pr['$xml_file_name']}.jsp">
<c:out value="${pr['/alfdotcom:pressrelease/alfdotcom:title']}" />
</a></h3>
<p class="summary"><c:out value="${pr['/alfdotcom:pressrelease/alfdotcom:abstract']}" /></p>
</div>
</c:forEach>

In order, the arguments to the function are as follows: the implicit JSP pageContext object; the virtual directory path to search for XML assets under; a boolean value indicating whether or not the search should be deep (i.e. descend into subdirectories); an XPath condition which must match true (e.g. to only match documents with a alfdotcom:pressrelease root element); an XPath expression, the value of which is used to sort documents (use null or an empty string if no sorting is needed); and lastly an integer indicating the desired sort order (1=increasing, 2=decreasing).

Here I’ve used JSTL’s c:forEach element to iterate over the collection of items returned, but other mechanisms are obviously possible. The code above is a pretty simple example of what can be output using EL to fetch field values from the XML based on simple XPath expressions, but more complex expressions can obviously be built up. I’ll post more examples another time.

Microsoft threats

Off the back of the latest Labs 3.0 Beta coverage I came across Microsoft’s Q3 SEC filing, via an article on Redmondmag.com.

The entire text is downloadable in (surprise, surprise) Word format – though not in Office 2007 format. No interesting metadata unfortunately, either.

Within the text, particularly interesting in an open source context is Part 1A of the “Other Information” section which lists the risk factors for the company. It’s worth reading the entire section if you’re interested in the threats to Microsoft, but the first three are the most relevant.

Item 1 on open source and SaaS:

Challenges to our business model may reduce our revenues and operating margins. Our business model has been based upon customers paying a fee to license software that we developed and distributed. Under this license-based software model, software developers bear the costs of converting original ideas into software products through investments in research and development, offsetting these costs with the revenue received from the distribution of their products. In recent years, certain “open source” software business models have evolved into a growing challenge to our license-based software model. Open source commonly refers to software whose source code is subject to a license allowing it to be modified, combined with other software and redistributed, subject to restrictions set forth in the license. A number of commercial firms compete with us using an open source business model by modifying and then distributing open source software to end users at nominal cost and earning revenue on complementary services and products. These firms do not have to bear the full costs of research and development for the software. Some of these firms may build upon Microsoft ideas that we provide to them free or at low royalties in connection with our interoperability initiatives.  A prominent example of open source software is the Linux operating system. Proponents of open source software continue efforts to convince governments worldwide to mandate the use of open source software in their purchase and deployment of software products.  Although we believe our products provide customers with significant advantages in security, productivity, and total cost of ownership, the open source software model continues to pose a significant challenge to our business model. To the extent open source software gains increasing market acceptance, sales of our products may decline, we may have to reduce the prices we charge for our products, and revenue and operating margins may decline.

Another development is the software-as-a-service business model, under which companies provide applications, data, and related services over the Internet. Providers use primarily advertising or subscription-based revenue models. Recent advances in computing and communications technologies have made this model viable and enabled the rapid growth of some of our competitors. We are devoting significant resources toward developing our own competing software plus services strategies. It is uncertain whether these strategies will be successful.

Item 2, on competition from smaller outfits and community-based groups:

We face intense competition. We continue to experience intense competition across all markets for our products and services. Our competitors range in size from Fortune 100 companies to small, specialized single-product businesses and open source community-based projects. Although we believe the breadth of our businesses and product portfolio is a competitive advantage, our competitors that are focused on narrower product lines may be more effective in devoting technical, marketing, and financial resources to compete with us. In addition, barriers to entry in our businesses generally are low and products, once developed, can be distributed broadly and quickly at relatively low cost. Open source software vendors are devoting considerable efforts to developing software that mimics the features and functionality of our products, in some cases on the basis of technical specifications for Microsoft technologies that we make available. In response to competition, we are developing versions of our products with basic functionality that are sold at lower prices than the standard versions. These competitive pressures may result in decreased sales volumes, price reductions, and/or increased operating costs, such as for marketing and sales incentives, resulting in lower revenue, gross margins and operating income.

And Item 3, on their dependence on tightly protecting their IP through patents and other mechanisms:

We may not be able to adequately protect our intellectual property rights. Protecting our global intellectual property rights and combating unlicensed copying and use of software and other intellectual property is difficult. While piracy adversely affects U.S. revenue, the impact on revenue from outside the U.S. is more significant, particularly in countries where laws are less protective of intellectual property rights. Similarly, the absence of harmonized patent laws makes it more difficult to ensure consistent respect for patent rights. Throughout the world, we actively educate consumers about the benefits of licensing genuine products and obtaining indemnification benefits for intellectual property risks, and we educate lawmakers about the advantages of a business climate where intellectual property rights are protected. However, continued educational and enforcement efforts may fail to enhance revenue. Reductions in the legal protection for software intellectual property rights or additional compliance burdens could both adversely affect revenue.

Repair web project web script

This is actually two small web scripts, which between them allow you to select an Alfresco web project and fix up the metadata on all form generated assets, including renditions. I wrote it to allow me to reassociate assets with their forms after using CIFS to do an export and import of some sites I’ve been working on.

The functionality is similar to that provided by Uzi’s Web Project Tools AMP which isn’t yet building against 2.2, but rather than being Java-based uses the JavaScript API against the AVM – which should remain stable over time.

At the moment the script iterates through the site directory structure looking for any XML item not associated with a web form. Then, if a form with a matching root tag is found associated with the web project then the appropriate aspects and properties are added to the node and any existing renditions.

Unlike with the Web Project Tools AMP existing form metadata cannot be removed or repaired, but I quite like the simpleness of this approach. Adding the ability to do this would perhaps be useful, but would complicate the UI and wasn’t required for my purposes. I might add this later, in addition to the ability to preview the changes.

Right now it’s recommended to ensure the current user’s sandbox is empty before executing the script, so that you can easily undo all changes should your data get munged. The script has been tested against a couple of web projects that I have access to but as it has not yet had widespread testing it should be used with caution!

Note, you need to have a sandbox for the web project you want to repair. If you do not then the site will not show up in the initial list.

To install the web script extract the contents of the ZIP file somewhere in your web script package hierarchy, either on the classpath or inside the Data Dictionary – org/alfresco/wcm works for me. Remember you’ll need to reload Alfresco or at least refresh the list of web scripts to get Alfresco to pick it up.

repair-web-project.zip

Comments welcome!

Update: Thanks to Nancy for pointing me at the Developer Tools area of the Content Community – the code is also now available there.

Stop the rot

Various reasons. I’ve been busy at work, busy at Glastonbury, busy flying around various European countries. The usual. My absence from the blogosphere has not helped by the fact that Facebook has recently been giving me that quick fix of Internet-based broadcast expression but with a lot less effort than is required to actually sit down and write something. But it’s time to stop the rot.

For once, I’ve had a weekend that I actually want to pen something about – and enough time on a slightly grey-looking Sunday evening to do so. Last weekend was all about G’s birthday weekend, too much Pimms and recovering from the effects whilst paddling around Chichester harbour and beyond in a vessel clearly not designed for such purposes but which worked surprisingly well.

This weekend has been similar, but with the Saturday festivities and socialising having been moved 40 miles south down to Brighton and the Sunday paddle consisting of the slightly more challenging 18 mile Maidstone to Tonbridge marathon, the latter having been completed in 2 hours, 48 minutes and 26 seconds (though estimated to be some three and a half minutes short due to the closure of the river at the last portage 🙁 ).

Brighton was good. Sufficiently different from the last time round, a scary five years ago. There was still plenty of Park-based fun, the rather gusty yet still utterly fantastic beach – where fish and chips were eaten – a little bit of drinking, and plenty of meeting new people. My pictures are about to go on FB, which though no doubt missing a large part of the evening after I trundled back off to London I must say I’m still rather happy with.

Next time it won’t be so long 🙂

Deployment Testing

One of the things we’ve improved massively in the forthcoming 2.2 release of Alfresco is the configuration and management of deployment targets. I’ve been playing around with this the last couple of days, using VMware to provide a couple of target hosts for testing.

wcm_configure_deployment.png

The file system deployer is really easy to set up – if you have Java already installed on the system then it’s just a matter of unzipping the deployer package, chmod’ing the shell script and firing it up. This starts a lightweight background Java process which listens via RMI for incoming connections from the authoring server.

I had a couple of issues myself, probably more related to my environment than anything else. Firstly I had to disable IPv6 on the target host, since it was causing the receiver to bind using only this and not using IPv4. Although I could telnet from my main OS and from other VMs onto port 44100, Alfresco was throwing back ConnectionRefusedExceptions, which I assume is some limitation of RMI.

To do this on the Debian VM, I updated the /etc/modprobe.d/aliases file, commenting out the following line:

alias net-pf-10 ipv6

and adding in the following replacement:

alias net-pf-10 off

This solved the first problem, but I then started getting errors indicating that Alfresco couldn’t connect to the host 127.0.1.1, seemingly something to do with the RMI server not picking up the proper IP address for the host. Rather than spend ages fiddling with the network settings, I simply added a -Djava.rmi.server.hostname argument to the command in deploy_start.sh, i.e.

nohup java -server -cp alfresco-deployment.jar:spring-2.0.2.jar:commons-logging-1.0.4.jar:alfresco-core.jar:jug.jar:. -Djava.rmi.server.hostname=192.168.60.130 org.alfresco.deployment.Main application-context.xml >deployment.log 2>&1 &

If your network configuration is sane and you’re not running under VMware you probably won’t need this at all. Obviously if you do then make sure you change your IP address to match that of the host.

Five minutes and three thousand assets later and apparently I had a successful deployment. Not bad.

wcm_monitor_deployment_success.png

The open road

Well done to John and John on the amazing article in the Guardian’s Technology section today. We made it onto page three!

Sadly the online version lacks the pretty pictures in the paper copy on my desk, but it’s a well-written piece, even if it does paint rather a depressing picture of Governmental take-up of open source in the UK.

GNOME 2.20

This is a great example of why I love free software – with the latest version of GNOME out the door, Evolution now helpfully warns you if you try to send an e-mail containing the word “attached” or similar but neglect to actually attach a file to the message.

Is that not the kind of simple, yet brilliant feature that when you hear about it makes you wonder why nobody’s thought of it before? Amazing.