Alfresco and ImageMagick on OSX

Since I migrated away from Windows on my development machine a couple of months back, I’ve not looked back. One of the main benefits I’ve found is being able to build software packages locally, and Homebrew makes that easy. Now I can easily grab the latest version of various dependencies such as MySQL and ImageMagick and install them in seconds, just as if I was using yum or apt-get.
However, one problem has plagued me since I started using the Homebrew-installed version of ImageMagick in conjunction with my local Alfresco installs, with errors such as this in the logs

 2012-08-02 10:20:10,547  DEBUG [transform.magick.AbstractImageMagickContentTransformerWorker] [main] org.alfresco.service.cmr.repository.ContentIOException: 07020000 Failed to perform ImageMagick transformation:
Execution result:
   os:         Mac OS X
   command:    [/usr/local/bin/convert, /Users/wabson/Development/projects/share-extras-2/software/tomcat/temp/Alfresco/ImageMagickContentTransformerWorker_init_source_4067880520213204197.gif[0], /Users/wabson/Development/projects/share-extras-2/software/tomcat/temp/Alfresco/ImageMagickContentTransformerWorker_init_target_7684811633126657147.png]
   succeeded:  true
   exit code:  133
   out:        
   err:        dyld: Symbol not found: __cg_jpeg_resync_to_restart
  Referenced from: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ImageIO.framework/Versions/A/ImageIO
  Expected in: /usr/local/lib/libjpeg.8.dylib
 in /System/Libra

This frustrated me for a while. It happened every time Alfresco tried to call convert in order to effect an image transform – for instance generating doclib thumbnails – but running the exact same command at a bash prompt worked fine.
Various internet posts all pointed to the environment in which ImageMagick is run as being the cause for such problems. It seemed the environment variable DYLD_LIBRARY_PATH in particular was known to cause problems on OS X, but checking /etc/profile, /etc/bashrc and the custom alfresco.sh script I use to start up Alfresco (included in my Tomcat packages) did not yield any trace of such a variable.
Finally I found a comment in ALF-13452, which led me to check the Spring config within the thirdparty subsystem, responsible for starting up ImageMagick.

   <bean id="transformer.worker.ImageMagick">
      <property name="mimetypeService">
         <ref bean="mimetypeService" />
      </property>
      <property name="executer">
         <bean name="transformer.ImageMagick.Command">
            <property name="commandsAndArguments">
               <map>
                  <entry key=".*">
                     <list>
                        <value>${img.exe}</value>
                        <value>${source}</value>
                        <value>SPLIT:${options}</value>
                        <value>${target}</value>
                     </list>
                  </entry>
               </map>
            </property>
            <property name="processProperties">
               <map>
                  <entry key="MAGICK_HOME">
                     <value>${img.root}</value>
                  </entry>
                  <entry key="DYLD_LIBRARY_PATH">
                     <value>${img.dyn}</value>
                  </entry>
                  <entry key="LD_LIBRARY_PATH">
                     <value>${img.dyn}</value>
                  </entry>
               </map>
            </property>
            <property name="defaultProperties">
               <props>
                  <prop key="options"></prop>
               </props>
            </property>
         </bean>
      </property>
     ...

There it was. The subsystem itself was setting up a few environment variables before firing up the external process, and these were defined in processProperties. One was the DYLD_LIBRARY_PATH variable.
The solution was trivial. I copied the Spring config in tomcat/webapps/alfresco/WEB-INF/classes/alfresco/subsystems/thirdparty/default/imagemagick-transform-context.xml into tomcat/shared/classes/alfresco/extension/subsystems/thirdparty/default/default/imagemagick-transform-context.xml as per the subsystems docs on overriding beans, and commented out the offending <entry> element.

mkdir -p tomcat/shared/classes/alfresco/extension/subsystems/thirdparty/default/default/
cp tomcat/webapps/alfresco/WEB-INF/classes/alfresco/subsystems/thirdparty/default/imagemagick-transform-context.xml tomcat/shared/classes/alfresco/extension/subsystems/thirdparty/default/default/
vim tomcat/shared/classes/alfresco/extension/subsystems/thirdparty/default/default/imagemagick-transform-context.xml

The result – image conversions now work perfectly in my local dev environment using the Homebrew-installed version of ImageMagick.

Share Extras moving to GitHub

Cross-posted to the Share Extras Development Group today.
Share Extras will be moving to a new platform in the next few weeks. The new platform will be GitHub, with the version control obviously provided by Git instead of Subversion, which is used at present.
There are many reasons for moving to Git from Subversion. I’ve spent a while reviewing the project issue list this week and it’s clear to me that there are many enhancement requests which may not be appropriate to make in the main Extras code base, or are simply not achievable within the project due to people’s time constraints.
Git will enable the wider community to fork individual any add-on as they feel the need to. I hope this will lead to more contributions back to the main project and foster greater collaboration (and perhaps a bit of competition?) with others.
Specifically, there is also the need to improve on the localisation of add-ons in a consistent manner, and there are some great tools available for doing that based on Git.
Whilst I had originally intended to simply migrate to a new Git VCS within Google Code, the frequent downtime and increasingly poor feature set and user interface when compared to other providers has led me to conclude that it is not a viable platform for future development.
In contrast, GitHub supports the notion of projects within an overall organisation much better (I have already registered Share Extras as an organisation) and has some great options for improving our documentation using cool standards such as Markdown.
I propose to migrate the current trunk codeline (and perhaps the 1.0 branch should time allow) at the end of August. There are several things that committers need to do before then, in order to help make it a success, such as ensuring that they have a README.md file in all their projects (this will become the main source of documentation, rather than the wiki), and helping to close down any issues that remain open on Google Code (issues will not be migrated over).
As always, if you have any feedback, please let me know. Otherwise I will be in touch with the main committers separately in the coming week or so.

Media Previews is dead, long live Media Viewers

A while ago I wrote about the newly-extensible Document Preview component in Alfresco Share and how you can use it to customize the out-of-the-box viewers and add your own custom implementations.
I mentioned at the end of that post that there were some new examples of custom viewers coming in Share Extras and I’m excited that we’re now ready to release them on the site.
Peter Löfgren had the great idea of using the pdf.js project to perform direct rendering of document content in the web browser using HTML5. I’d already started work on some simpler examples of custom viewers, and on refining the Flash audio/video players that Share Extras previously provided for Alfresco 3.3/3.4.
The result is what we’re calling the Media Viewers add-on. This bundles up a total of six viewer implementations designed to show different ways of implementing custom document views, both with and without Flash.

  • PdfJs displays documents, presentations and any other file capable of being transformed to PDF in-line in the web browser using the excellent pdf.js viewer, which uses the power of HTML5 to remove Share’s Flash dependency for document viewing.
    The viewer supports a number of features not directly supported by the Flash document previewer, such as a sidebar with thumbnail, outline and search views, bookmarking of individual pages of a document, and will remember the page number and zoom level of previous documents that you have viewed.
    PdfJs Viewer
  • FLVPlayer and MP3Player display compatible audio and video files respectively, within the web-browser using the open source FLV Player and MP3 Player media players by neolao. Based on the content’s MIME type, the updated component automatically chooses the appropriate previewer to use.
    While similar to the Flash players provided by Share out-of-the-box, these implementations allow advanced customization of the player via configuration and if FFmpeg is installed, will fire up a transformation to allow viewing of non-H264/FLV video and non-MP3 audio. The user is informed when conversion is in progress and the screen automatically updates when the content can be viewed.
  • Embed uses an in-line iFrame to embed the content itself directly inside the web page. It is suitable for use with content types that can be viewed directly within the web browser such as plain text and PDF, with the Chrome or Acrobat plugins installed. Again, this can be used to avoid the use of the Flash previewer for some clients.
  • Prettify allows formatted code, mark-up and other supported text formats to be displayed in directly in the document and uses the google-code-prettify project to provide an in-line browser-based view with syntax highlighting.
    Prettify Viewer
  • WebODF is an EXPERIMENTAL viewer which uses the AGPL-licensed WebODF project to display ODF content directly in the web browser.
    WebODF cannot be distributed with the add-in itself, so in order to use it you must also download the latest JAR file from the supporting share-webodf project and install it in the same way as the main media-viewers JAR file.

What’s most exciting is that we’re building on top of some great projects such as google-code-prettify, WebODF and pdf.js, that are evolving at a fast pace and changing the way that documents are viewed in a browser environment.
There is plenty more information including download links on the main Media Viewers project page on Share Extras. Please try it out, let us know what you think and help us to improve this collection.

Devizes to Wesminster Results 2012

So, the Devizes to Westminster race results from this Easter weekend are up on the official site, and using ScraperWiki I’ve done some crude analysis.

Comparing with previous years, there seems little change from last year – the average time was just two minutes and five seconds slower. That compares with over two hours’ difference from 2010, when the flow was much stronger.

The one difference that was clear from 2011 was that the larger tide this year had a greater affect on crews’ tideway speeds, as the steeper line for the last checkpoint speed shows. This may have been offset by lower flows on the lower section of the Thames, however.

Comparing the Richmond crews’ times, it’s clear that although all crews seem to go harder off the start than they will throughout the rest of the course, some go harder than others! On the right of the graph you can also see the effect of an injury, missing the top of high water at Teddington and missing the tide completely for the three slowest boats.

Overall the graph shows there were some great performances for Richmond, but the three retirements after Newbury meant we were unfortunately left with a few less boats crossing the finish line than we’d hoped. Well, there’s always next year.

If you’re interested in more then the full data that I used to generate the graphs can be found in this Google Spreadsheet, and you can grab a report for other boats using the ScraperWiki view.

Share Document Previews in Alfresco 4

One of the improvements introduced in Share’s Document Library as part of Alfresco 4 was the refactoring of the web-preview component, which is responsible for rendering the in-line document views in the Document Details page.
These preview capabilities are one of the core strengths of Share, and now these changes allow developers and administrators to tweak the capability as they need.

Flash Document Previewer

Web preview component in Alfresco 4 for a PowerPoint Presentation

The Basics

Let’s start with a quick overview.
The web-preview component is similar to most other Share components. The page component itself is rendered by a web script, which is responsible for loading some basic information about the document being requested, and outputting this into the markup that then forms part of the overall page.
The output does not render HTML directly, but rather instantiates a client-side component Alfresco.WebPreview, passing over the document metadata as an object literal.
Previously, that was it. The logic to set up the Flash previewer was contained within Alfresco.WebPreview, and that was what you got for all documents and images. Anything else that couldn’t be transformed by the repository to a SWF file would not be previewed.

What’s Changed?

Alfresco 4 changes that. Now, the rendering of content items is separated from Alfresco.WebPreview and instead performs the work via a series of plug-ins. The component still takes care of loading the document metadata, setting up the basic screen layout and working out which plug-ins can be used to render the content on-screen. This has several benefits –

  • System administrators can decide which plug-ins should be used under which circumstances, via configuration.
  • Plug-ins are chained together in the configuration. If one fails, then the other available ones are tried in turn, until one succeeds.
  • The implementation of the plug-ins by developers does not need to touch any core files. They are tied into the component via configuration alone.

What’s more, Share comes with a number of different plug-ins already present. It’s likely that you will still see the Flash document previewer for most of the content you look at, but you should notice that by default images are now rendered as real images within the page, and will be automatically resized if they are too big. That’s because images are now handled by a different plug-in, but you can still use the Flash previewer if you reconfigure the component.

Image Viewer

Web preview component in Alfresco 4 for a JPEG image


Other previewers are provided for displaying audio and video content using Strobe Media Player or FlashFox, plus direct playback of supported formats using the HTML5 <video> and <audio> tags. Take a look in the directory components/preview within the Share webapp and you will see them there. It’s worth looking at the source code if you’re thinking of implementing your own custom previewers, as many of them are surprisingly simple.

Configuring Plug-ins

The configuration that dictates which plug-ins are used under what circumstances is contained in the web script configuration file web-preview.get.config.xml. You can find this in the Share webapp in the directory WEB-INF/classes/alfresco/site-webscripts/org/alfresco/components/preview and view it there, but if you’re planning on making any changes you should first copy it into tomcat/shared/classes/alfresco/web-extension/site-webscripts/org/alfresco/components/preview (you may need to create the directories below web-extension).
You should see that the config is expressed as a series of conditions, each of which contains a list of plug-ins to attempt to use. For each condition that matches, each of the plug-ins within it is then added to the list to try to use.
Here’s an example for MPEG-4 video files

<condition mimeType="video/mp4" thumbnail="imgpreview">
    <plugin poster="imgpreview" posterFileSuffix=".png">StrobeMediaPlayback</plugin>
    <plugin poster="imgpreview" posterFileSuffix=".png">FlashFox</plugin>
    <plugin poster="imgpreview" posterFileSuffix=".png">Video</plugin>
 </condition>

The <condition> element’s attributes form sub-conditions, which are AND-ed together to give the overall result. In this case, the condition will match when the content being viewed has the MIME type video/mp4 and the thumbnail with the name imgpreview can be generated (note, it may not yet have been generated) via the thumbnail service. Other conditions will only test one of these sub-conditions, but here we test both.
Each of the <plugin> elements then provides the name of the plug-in in the element body and any configuration attributes required as element attributes. In this case, each of the plug-ins is being configured to use the imgpreview thumbnail as the static ‘poster’ image to display in the player before the play button is pressed, and to assume this image is in PNG format. Note, these attributes will be different for each plug-in since they are mapped to the options object on the plug-in implementation.
As an administrator, within this file you can add, remove and re-order plug-ins or conditions as you need, change the conditions or change the configuration that is passed to the plug-ins at instantiation time.

Implementing Your Own

It’s easy to create your own plug-ins to render content in interesting and different ways. A plug-in is implemented as a single JavaScript class, which must meet the following simple criteria

  • The name of  its constructor must be the same as the value used in the <plugin> configuration element to tie it into the component, and must be defined as a child property of Alfresco.WebPreview.prototype.Plugin. By convention CamelCase is used for the name, e.g. WebPreviewer, FlashFox, Image.
  • The object prototype must define an object literal named options and two functions report() and display(). See the in-line JSDoc of the out-of-the-box plug-ins for full details of the method signatures and return types required.

You will also need to ensure that the client-side file where you define the plug-in object is included in the <head> section of the Document Details page, you can do this as an administrator by overriding the web-preview.get.head.ftl template, or (better) as a developer by extending the .head.ftl template using a customization module applied to the web-preview.get web script.

New Viewers on Share Extras

Keep an eye out for the upcoming release of the (to-be-renamed) Media Previews add-on in Share Extras. Peter has been instrumental in helping to build up some new plug-ins for viewing different types of content that we hope can be used to remove the Flash dependency that still remains in Share for viewing document-based content. They should also provide some further examples of how you can define your own, custom, plug-ins.

Twitter Dashlets 2.5 Released

If you’ve tried Share Extras‘s Twitter add-ons recently then you might know that the Twitter Dashlets add-on for Alfresco 4 bundles up two similarly-named add-ons that I released last year.
Using the dashlets that the add-on provides, you can follow a particular user or Twitter list, or follow the results of a Twitter search, direct from your Share Dashboard. Great for monitoring the results of a social media campaign related to a particular site, or just topics that are important to you.
But the dashlet that I’m now using the most is the Twitter Timeline dashlet that allows you to easily connect to Twitter’s authenticated resources via OAuth, which I showed at last year’s Alfresco DevCon.
Release 2.5 tidies up a number of bugs but more importantly now provides a consistent experience across all three dashlets, in a couple of areas.
You can view a short video of the updated dashlets or submit feedback on the Share Extras site.
Authenticated access
The Reply, Retweet and Favorite actions were always present on the Twitter Timeline dashlet but now also appear on the other dashlets too, if you have previously connected your Share user account to a Twitter profile.
Twitter dashlet actions
This has the added benefit that you will be able to view the protected tweets of any users you follow in the Twitter Feed dashlet, since requests will always be authenticated whenever you are connected.
Notifications
All three dashlets will periodically poll for new Tweets, and the notifications have been made more compact and added into the dashlet title bar
Twitter Timeline notifications
To load the new tweets, just click the green notification icon.
More options for Search results
The Twitter Search dashlet now allows you to choose between recent tweets, ‘top’ tweets as rated by Twitter, or a mix of the two.

Improved Geo Capabilities in Share Extras

Prompted by Jeff’s post the other day on developing custom actions for Share, I’ve spent a small amount of time this week revisiting the custom Document Library actions hosted on Share Extras, and making a few 4.0-compatibility updates.
There are now two add-ons within the project that provide custom doclib actions for 4.0

  • The Execute Script action provides a small JavaScript action with a dialogue for selecting a JavaScript file to run against a document or folder. Although duplicated by the new onActionFormDialog-powered document-execute-script action (configured out-of-the-box but not included in the default action groups) in 4.0, it provides a useful example of a simple client-side custom action.
  • The Document Geographic Details add-on provides a Geotag action for adding or modifying geo data on repository items, plus two other actions which complement the out-of-the-box View in Google Maps action by adding support for OpenStreetMap, as well as other mapping providers via the Geohack service used by Wikipedia articles.

Since the second add-on provides a few more varied examples of custom actions (which may not be obvious from the project name!) I thought it was worth stepping through each of these in turn.

Geotag Action

This JavaScript action was present in the original 3.x version, and allows geographic information – basically latitude and longitude values – to be added to items, or modified on items which already have it. If your camera doesn’t have a GPS receiver to capture this information, or if you need to make minor corrections to it, then this might be useful.
To make the action compatible with 4.0 there weren’t too many changes needed. Actions are now declared via Bubbling rather than as additional prototype methods of Alfresco.doclib.Actions, so a small refactoring of the code is necessary, but should not affect the main logic of your JavaScript action.
I’ll try to post an update in the near future on converting your existing actions to 4.0.
Geotag action
The action works by calling a separate web script – included within the add-on – to display a modal dialogue with a Google Map view. You can use the cursor to position the marker indicating the item’s location as you need, then the position of the marker is saved back to the repository when the OK button is clicked.

View in OpenStreetMap Action

One of the new features in Alfresco 4.0 is an additional page that allows you to view the geographic location of an item using an embedded Google Map. But sometimes Google Maps just doesn’t have the detail that other mapping providers have – particularly for rural areas – and OpenStreetMap is a great example of a collaboratively-built map available on open terms.
The OpenStreetMap view looks similar to the Google Maps view. In fact, although it declares a new site page, it borrows everything but the map view itself from existing Share page components on the Google Maps page.
View in OpenStreetMap page
The map view is provided by a single additional web script, which delegates the work of setting up the map to a client-side component, as per the regular Share pattern.
When I first added the map view last year, I implemented the view using the popular OpenLayers viewer, but I had problems getting a marker and an associated pop-up that looked as good as Google’s. So I switched the other day to the upstart Leaflet,  which provides a much easier-to-implement and cleaner-looking alternative.
From their site,

Leaflet is a modern, lightweight BSD-licensed JavaScript library for making tile-based interactive maps for both desktop and mobile web browsers, developed by CloudMade to form the core of its next generation JavaScript API.
It is built from the ground up to work efficiently and smoothly on both platforms, utilizing cutting-edge technologies included in HTML5. Its top priorities are usability, performance, small size, A-grade browser support, flexibility and easy to use API. The OOP-based code of the library is designed to be modular, extensible and very easy to understand. Find out more on the features page.

This is the first time I’ve tried Leaflet, but it works well here. I’d be interested in any feedback on this, as I may consider replacing the Google Maps view currently provided in the Document Geographic Details page component with it in the future.
There’s no doubt a few enhancements possible here, such as displaying the location of nearby repository items on the map view in addition to the currently-selected item. Again I’d be interested in any feedback.
Back to the action definition. Now since the action itself just needs to link to the new page, it can be easily handled by the pagelink action type, much easier than implementing a custom client-side action handler. An action evaluator ensures that the action is only shown for geotagged items, and is borrowed from the configuration of the View in Google Maps action.
Lastly, if you look closely at the screenshot you’ll see that as well as displaying the map itself, it also indicates the location of the marker, e.g.

Showing item in London Borough of Richmond upon Thames, Greater London, London, England, United Kingdom

This is looked up via a separate AJAX call to OpenStreetMap’s Nominatim reverse geocoding service, which takes a latitude and longitude, and returns some text indicating the name of the area that it resides in. Nominatim is accessed via a custom Surf endpoint, declared in the add-on’s configuration.

View Location on Geohack Action

The Geohack service allows a referring web site to specify a latitude and longitude as URL parameters and then returns links to that location on a wide range of mapping providers. If you want to view a location on other mapping services besides Google Maps and OpenStreetMap then you can use this action to do that.
Like the Geotag action this is implemented as a client-side JavaScript action, but since it only needs to open a new window with the Geohack service there is no additional dialogue or page definition needed.
With 4.0, it is possible to implement such actions much more easily using the link action type, which takes a target URL in its configuration, and which can include parameters that are substituted in at runtime, based on the node’s properties. But implementing it in a client-side handler as I’ve done here can give you more control.

Developing your own actions

Hopefully the three examples I’ve walked through will provide examples of what is possible to achieve in Alfresco using custom actions. You can find the code linked to from the Document Geographic Details add-on page, and further docs in Client-side template and action extensions on docs.alfresco.com and in Mike Hatfield’s DevCon 2012 presentation.

New Community 4.0.c Amazon EC2 AMI

Since Alfresco Community 4.0.c was quietly slipped out over the Christmas break, I’d been meaning to get round to creating an AMI with the new release. After I was asked by a colleague who was trying to do something similar with our upcoming Enterprise version, I managed to grab a short amount of time to revisit the image creation process that I wrote about back in November.
First off, I decided to reduce some of the complexity by breaking with the past and going for an EBS boot image only, rather than the S3-backed instance-store types that I previously put together. As well as being simpler to create, EBS boot images offer several advantages, the chief one being that by virtue of their EBS-backed storage, they can persist their state when turned off.
From my initial testing, the EBS boot images seem to perform well too, so unless I get any reports of significant problems I’ll be using the same method to create all future AMIs as well.

Running the images

The new 4.0.c image is listed on the Alfresco EC2 Images page. Launching it is easy using the AWS web console.

  1. Click the Launch in AWS Management Console icon next to the version you want to run
  2. Log in using the e-mail and password you registered with Amazon, if you are not already logged in
  3. Step through the wizard, making sure that you select Small in the instance size
  4. When you get to the security part, make sure you select a security profile that has inbound SSH (port 22) and HTTP (port 80) enabled, plus any other interfaces you want to allow into Alfresco

After the wizard has completed you can monitor the start-up of your instance in the Instances area of the EC2 web console. To connect to Alfresco Share on your instance, simply paste the public DNS name of the instance into your browser address bar. It may take up to five minutes for the repository to run through the bootstrap process, after which you will be redirected to the normal Share login page.

Creating your own images

There’s nothing to stop you creating your own Alfresco AMIs, using the same procedure that I’ve used here. As I mentioned, when creating EBS boot instances the procedure is much simpler, and there is no need to copy private key files across the network onto the host.
There are a few steps to carry out, but I’ve tried to make these as straightforward as possible

1. Start up your base image

Start by running up a pre-configured Ubuntu or other Linux AMI – see Canonical’s list for the latest official versions. You should pick the right one for your geography and size requirements, but I use the most recent 32-bit EBS boot AMI from the West Europe region.
To start up the image, just click the relevant link in the list, or use the Launch wizard in the EC2 Management Console to pick an alternative Linux-based AMI. The scripts have been tested with Ubuntu 11.10 and 12.04 32-bit instances, so I’d recommend either of those where possible. 64-bit AMIs do not seem to run well on small instances, so if you use one of those then ensure you pick a larger instance size.

2. Log in via SSH

I use PuTTY to connect to the newly-created instance. You’ll need to have created a keypair for your AWS account and have imported it into PuTTY. You can then connect via SSH as the relevant user account (the Ubuntu images use the user ‘ubuntu’) and the correct key file.
There is general information on connecting to Linux instances in the EC2 help and Eric Hammond has a page that explains some more about securely connecting to the official Ubuntu images, which should be accurate for all the AMIs listed on his site.

3. Download and extract the Ubuntu Quickstart scripts

These are now available as a ZIP download from the alfresco-ubuntu-qs project on Google Code. Find the latest version in the Downloads section and right click to copy the download link to your clipboard.
Back in your SSH client you can then download the files using the following commands (modify as needed for the latest version of the scripts), or you can skip the first line by downloading the file from the project site and SCP’ing it up to your Amazon instance.

curl http://alfresco-ubuntu-qs.googlecode.com/files/alfresco-ubuntu-qs-0.9.8.tar.gz -o alfresco-ubuntu-qs-0.9.8.tar.gz
tar xzf alfresco-ubuntu-qs-0.9.8.tar.gz

4. Install Alfresco
Now we can install the Alfresco files using the install.sh script supplied in the Ubuntu Quickstart package.

cd alfresco-ubuntu-qs
sudo ./install.sh --alf-version 4.2.b --jdk-version openjdk-7 --no-install-dod

Note that the --no-install-dod option to install.sh is only needed for version 4.0, for which the DoD5015.2 add-on module is currently unavailable.
For Alfresco Community 4.2, only JDK7 is required, which you must enable using --jdk-version openjdk-7.
You can follow the progress of the script in your SSH client. When you are prompted for a MySQL password, you must enter ‘alfresco’ – unless you have changed the value of $MYSQL_USER in install.sh to something else.
When the script has finished it will tell you that installation is complete and that Alfresco can be started. DO NOT START ALFRESCO UP AT THIS TIME – unless you want a pre-installed repository to be part of your AMI.
5. Create your AMI
One of the benefits of EBS boot instances is that you can create images with a single command using the AWS API tools, or directly from the EC2 web console.
To create your AMI from the web console, right-click the running instance in the Instances section, select Create Image from the menu, enter an appropriate file path in the ‘name’ field, and confirm.
The image creation process should take no more than a few minutes – far quicker than the instance-store method! Once it is done, you should be able to see your image in the AMIs section of the web console. You can test it by starting up an instance of it as per the steps in Running the images, above.

6. Cleaning up

Last but not least, make sure that the instance that you used to create your image is terminated, so it will not keep on consuming billable resources! You should do the same for any additional instances that you’ve fired up to test your image.

My Open Data Consultation Response

This is my response to the UK Government’s Open Data Consultation, which I submitted via email today.

Although I wanted to respond earlier, I’m glad I waited, as my experience assisting (or at least trying to assist) with data gathering for Gail Knight’s Great British Toilet Map has been pretty instrumental in shaping my views.

Of the three London boroughs I put my query to, one (to their credit) explained that they didn’t hold data on such a thing, another required a legal approval process for re-use which still leaves me with some doubt on the terms under which I can re-use the information, and another for my local borough sadly seems stuck in an ongoing FoI request.

This is the sad reality of open public data in the UK today, that most of it is not open and with large swathes of ignorance among the very people who are the biggest stake holders in all of this – those people who work for local Government.

So here’s my response below – if you have any interest at all in this field or at least an understanding of the benefits that open data will bring, I’d urge you to submit something yourself before the consultation closes at midnight tonight.

–BEGINS–

I am an independent software developer and open data advocate and have been actively involved in a number of collaborative projects aiming to bring the benefits of free and open data to a wider portion of society.

Most recently I have been involved in the Great British Toilet Map a project which seeks to provide a single map of all public conveniences in the United Kingdom. This has involved me making requests for open data from a number of London Borough Councils, a process which I have found extremely difficult and only partly successful despite the trivial nature and very low volume of the data concerned.

In light of this experience I am supportive of the idea that a “right to data” is helpful for citizens, developers, entrepreneurs and ultimately our wider economy and societal well-being. My experience to date has been of public sector organisations with little or no knowledge at all of this important new area of information governance, and with few resources and little inclination to assist those of us who are currently trying, despite the substantial barriers, to develop innovative services which not only create value in themselves but also expose the ‘bottlenecks’ within our public sector. This is a win-win scenario for all parties concerned.

Clearly, there is work to be done to improve this situation. Although I believe education of public sector organisations and those acting on their behalf will help to address the lack of knowledge, I believe immediate and concrete action is needed from Central Government, which also needs to do more to ‘lead from the front’. I hope to provide some further views on what shape this action may take in the detailed response which follows.

How we might enhance a “right to data”, establishing stronger rights for individuals, businesses and other actors to obtain data from public bodies and about public services

A right to data is of vital importance to establishing a vibrant open data ecosystem in the United Kingdom and the associated economic benefits that numerous studies have shown this will bring.

Since such a large volume of data is held at both a national and regional level relating to public services, by public bodies, it is vital that such a right to data applies to all bodies providing services to the public utilising public money, in full or in part.

However such rights must fit in with the current Freedom of Information (FoI) landscape and in particular the Reuse of Public Sector Information (PSI) guidelines which govern how this information may re-used. These alone are not sufficient to provide a “right to data” but does provide a useful and similar example which has been generally successful in its implementation.

Existing FoI legislation provides the broad availability of information to individuals, businesses and other actors and this is also a key requirement for a right to data. However FoI does not adequately address other concerns connected with requesting open data from public bodies. In particular,

  • FoI does not encourage the re-use of released information, and in fact most organisations prohibit this without explicit and additional approval. This often in my experience requires a legal review which introduces unnecessary delays and costs. Under a right to data requesters should be granted the ability to re-use the data by default, rather than as an exception. The re-use should not require disclosure of the purposes of the re-use or the intent of the requester, instead the information should be explicitly granted under a standard licence such as the Open Government Licence (OGL). Requesters should have the right to request release under an alternative open licence if required.
  • Most often data supplied under FoI is derived data delivered in unstructured formats, even where this exists in a structured form within the organisation. A right to data should shift the focus to providing the full and raw information, with the exception of any personal data that falls under the scope of the existing Data Protection Act. There should be a presumption in favour of publishing the full raw data unless it can clearly be shown that this is not possible (see below).
  • It is not always made clear what related data is held by the organisation, or where information has been not included in the response. Organisations should publish an open list of the data held by them internally, for what purposes, and who has access to each system, in order to allow requesters to place suitable requests in the first place.

Although the structures provided by FoI are helpful in allowing citizens access to public data, the limitations above mean that is currently a rather blunt instrument for requesting open data from organisations. The Government must therefore strongly consider bringing forward additional primary legislation with the view of setting up a similar framework for open data, or modifying the existing framework to overcome these deficiencies.

How to set transparency standards that enforce this right to data

Transparency is an end goal of the greater openness which a right to data seeks to deliver. Organisations should understand their responsibilities and duties around transparency, but a greater level of openness should also be seen as a way to deliver increased involvement of citizens around public issues, and greater levels of engagement with the bodies themselves.

Although openness itself is difficult to measure, qualitative measurements of external engagement levels and of transparency in decision making should be used to provide comparisons between organisations.

Within local government although excellent levels of transparency and accountability often exist at the executive level, less is to be found at the departmental level below that, and therefore this should be particular focal point for comparisons.

How public bodies and providers of public services might be held to account for delivering Open Data

The Information Commissioner Office (ICO) guidance provides a useful model for dispute resolution in FoI requests. The process could be similar for the new right to data, providing an option for requesters to request an internal review if they are unhappy with the handling of a particular case, followed by an external review by the ICO should this not be sufficient to resolve the situation.

Although a new body could be considered to police the system and hold organisations to account, the ICO has a great deal of experience in this area already and may prove a more effective – and cost effective – solution.

Penalties should be applicable for organisations which consistently fail to deliver on their expectations, but the design of these penalties should ensure that money is not taken away from the field of open data. For instance, if it is deemed that a financial penalty is appropriate, this money should be channelled into a central fund for other open data projects in the public sector.

How we might ensure collection and publication of the most useful data

Sites such as data.gov.uk act as a useful focus point for coordinating the release of new open data. The Requests section in particular offers a way of gauging which data sets have the most interest around them, but the current implementation contains too large a number of requests, many duplicated, and requires more active management. The ability of users to vote on other requests is key in determining the level of interest, but there is no requirement on the Cabinet Office to respond to requests when they reach a certain level of interest and more generally it is not clear how this list translates into action. This should be rectified immediately.

Although some data may be published pro-actively by organisations on sites such as data.gov.uk, this alone is not sufficient, and therefore the focus must be in giving citizens themselves the right to request any information held by any public-financed organisation as open data.

Since my experience has shown that many organisations today are not sufficiently enlightened in this field, it is necessary to examine the reasons why requests made under a “right to data” could be refused, and to mitigate against these.

Not all public data will be possible to publish in an open and machine-readable format. It may be that data exists in legacy systems which have not been designed with an open export format in mind.

However this alone should not be a sufficient reason for organisations to refuse requests. It may be possible that even where the organisation lacks the expertise or the budget to produce the raw data exports requires, that this expertise exists in other companies or organisations. Indeed, this could be used to stimulate activity in the SME IT sector if the work were offered through a public tender. Voluntary groups may also be interested in helping in situations where the commercial sector is unable to meet the challenge, and their costs could be met through a central fund where funding streams within the organisation are not available.

Only when it can be demonstrated that an organisation has done all it can to extract information from its internal systems itself and that it has also seeked external input and failed to come up with a solution, that alternatives to the original source data should be evaluated, or – where no alternatives are available – that the original request should be refused. Even in those circumstances, it should be possible for the requester to request a review of the decision should any of the circumstances change in the future (e.g. a change in IT systems).

Organisations also have a responsibility when procuring new IT systems to ensure that data is stored in open formats, or at the very least, can be exported in open formats in real time. Organisations should be accountable for this and should be able to demonstrate as part of the public tender process that they have taken this requirement into account for all new systems.

How we might make the internal workings of government and the public sector more open

Greater transparency must be recognised up-front as a key driver of the proposed “right to data”, to ensure that taxpayers are receiving best value for money and that officials are held accountable. Progress on this front should be actively monitored by central government and additional steps taken where necessary to ensure that periodic goals set by Government are met. Timelines for action should be published to allow citizens to further hold those in this oversight role to account.

How far there is a role for government to stimulate enterprise and market making in the use of Open Data

The Government and other public bodies under its control have a clear responsibility to make all public data openly available as the default option. It should not attempt to influence the open data ecosystem which remains at an early stage of development and shows considerable promise that it will develop as a world-leader in the field.

However, Government has a role to play in ensuring that the data itself is made freely available and should prioritise the release of the ‘base’ data which it holds such as mapping and weather information, which is required in order to give context to the majority of the other data sets published.

Lastly, the government can help in the longer term by encouraging the use of open standards by data publishers and in providing more general education and best practice to them.

–ENDS–

Alfresco 4.0 in Amazon EC2

Update January ’12: These instructions are now deprecated. A simpler procedure, allowing easy creation of EBS-boot images is now documented as a follow-up post.
I’ve just added a new AMI for Alfresco 4 onto my Alfresco EC2 Images list. Running these files is now even easier, based on the method used by Eric Hammond’s alestic.com, with a link next to each image that allows you to click directly through to the AWS Management Console. If you have an AWS account, you’re now just a few clicks away from launching your own cloud-based instance of Alfresco 4.0.
Of course, the usual disclaimers apply here. These are not official images in any way, and should not be used for production purposes. But if you want to try out Alfresco 4 without the hassle of managing your own install, hopefully it will be useful.
It’s also worth pointing out that the scripts I use to create these are public, hosted on the alfresco-ubuntu-qs project on Google Code.
You should be able to create your own Alfresco 4 AMIs by following these simple steps

  1. Start by running up a preconfigured Ubuntu or other Linux AMI – I use Eric Hammond’s list for the latest versions. Pick the right one for your geography and size requirements, I use the most recent 32-bit instance-store AMI from the W Europe region
  2. While the instance is starting up, download the latest Quickstart scripts from Google Code
  3. Once the machine is started, check you can connect to it via SSH, using the keypair you specified when starting the image and the username ‘ubuntu’
  4. Create a new directory named ‘ec2’ in your home directory on the running instance
  5. Use SCP or rsync to copy the quickstart scripts bundle, plus your AWS certificate and private key files (cert-blah.cert and cert-blah.pk) from your local machines. Place the script bundle in /home/ubuntu and the certificate and key files in the new /home/ubuntu/ec2 directory
  6. Back in your SSH session on the instance, extract the contents of the quickstart bundle and change into the new alfresco-ubuntu-qs directory
  7. Use the install.sh script to install Alfresco and its dependencies on the instance by typing sudo ./install.sh. For 4.0.a and 4.0.b, which do not support the DOD Records Management module, you will need to add the --no-install-dod option to the command.
  8. The script will run through and you will be prompted for a MySQL password. You must enter ‘alfresco’ unless you have changed the value of $MYSQL_USER in the script to something else.
  9. The script will indicate that it has finished installing Alfresco. Do not start Tomcat, since this will bootstrap the repository data, which you do not want to do before bundling.
  10. Change back into your home directory
  11. Create the AMI files using the sc2-bundle-vol command
    sudo ec2-bundle-vol -d /mnt -p alfresco-community-mysql-4.0.a-i386 -u 111111111111 -k ec2/pk-*.pem -c ec2/cert-*.pem -e /home/ubuntu/ec2,/home/ubuntu/.ssh,/home/ubuntu/.cache,/home/ubuntu/.sudo_as_admin_successful,/home/ubuntu/.byobu,/home/ubuntu/alfresco-ubuntu-qs,/home/ubuntu/alfresco-ubuntu-qs-*.tar -s 4096
    You must set your numerical AWS account ID using the -u flag. Also you should review the list of excluded files to ensure that you are not bundling any files that you do not want to.
  12. Once the bundling process has finished, upload it to your S3 bucket using the ec2-upload-bundle command
    ec2-upload-bundle -b my-s3-bucket -m /mnt/alfresco-community-mysql-4.0.a-i386.manifest.xml -a aid -s secret --location EU
    You must specify your S3 bucket name using the -b option, and ensure that you set your AWS access key and AWS secret key using the -a and -s options
  13. Once the upload has completed, log into your AWS EC2 web console, navigate to the AMIs section and click the Register New AMI button to register your new image. Enter the path of the uploaded manifest file within the bundle you just uploaded, this will be something like ‘my-s3-bucket/alfresco-community-mysql-4.0.a-i386.manifest.xml’
  14. Now your AMI is registered you can see if it works by creating a new instance of it. If it does, then you can safely shut down the originial Ubuntu instance as you will no longer need this.
  15. If you want others to be able to run your image then you will need to add the necessary permissions for this, using the web console.