My Open Data Consultation Response

This is my response to the UK Government’s Open Data Consultation, which I submitted via email today.

Although I wanted to respond earlier, I’m glad I waited, as my experience assisting (or at least trying to assist) with data gathering for Gail Knight’s Great British Toilet Map has been pretty instrumental in shaping my views.

Of the three London boroughs I put my query to, one (to their credit) explained that they didn’t hold data on such a thing, another required a legal approval process for re-use which still leaves me with some doubt on the terms under which I can re-use the information, and another for my local borough sadly seems stuck in an ongoing FoI request.

This is the sad reality of open public data in the UK today, that most of it is not open and with large swathes of ignorance among the very people who are the biggest stake holders in all of this – those people who work for local Government.

So here’s my response below – if you have any interest at all in this field or at least an understanding of the benefits that open data will bring, I’d urge you to submit something yourself before the consultation closes at midnight tonight.

–BEGINS–

I am an independent software developer and open data advocate and have been actively involved in a number of collaborative projects aiming to bring the benefits of free and open data to a wider portion of society.

Most recently I have been involved in the Great British Toilet Map a project which seeks to provide a single map of all public conveniences in the United Kingdom. This has involved me making requests for open data from a number of London Borough Councils, a process which I have found extremely difficult and only partly successful despite the trivial nature and very low volume of the data concerned.

In light of this experience I am supportive of the idea that a “right to data” is helpful for citizens, developers, entrepreneurs and ultimately our wider economy and societal well-being. My experience to date has been of public sector organisations with little or no knowledge at all of this important new area of information governance, and with few resources and little inclination to assist those of us who are currently trying, despite the substantial barriers, to develop innovative services which not only create value in themselves but also expose the ‘bottlenecks’ within our public sector. This is a win-win scenario for all parties concerned.

Clearly, there is work to be done to improve this situation. Although I believe education of public sector organisations and those acting on their behalf will help to address the lack of knowledge, I believe immediate and concrete action is needed from Central Government, which also needs to do more to ‘lead from the front’. I hope to provide some further views on what shape this action may take in the detailed response which follows.

How we might enhance a “right to data”, establishing stronger rights for individuals, businesses and other actors to obtain data from public bodies and about public services

A right to data is of vital importance to establishing a vibrant open data ecosystem in the United Kingdom and the associated economic benefits that numerous studies have shown this will bring.

Since such a large volume of data is held at both a national and regional level relating to public services, by public bodies, it is vital that such a right to data applies to all bodies providing services to the public utilising public money, in full or in part.

However such rights must fit in with the current Freedom of Information (FoI) landscape and in particular the Reuse of Public Sector Information (PSI) guidelines which govern how this information may re-used. These alone are not sufficient to provide a “right to data” but does provide a useful and similar example which has been generally successful in its implementation.

Existing FoI legislation provides the broad availability of information to individuals, businesses and other actors and this is also a key requirement for a right to data. However FoI does not adequately address other concerns connected with requesting open data from public bodies. In particular,

  • FoI does not encourage the re-use of released information, and in fact most organisations prohibit this without explicit and additional approval. This often in my experience requires a legal review which introduces unnecessary delays and costs. Under a right to data requesters should be granted the ability to re-use the data by default, rather than as an exception. The re-use should not require disclosure of the purposes of the re-use or the intent of the requester, instead the information should be explicitly granted under a standard licence such as the Open Government Licence (OGL). Requesters should have the right to request release under an alternative open licence if required.
  • Most often data supplied under FoI is derived data delivered in unstructured formats, even where this exists in a structured form within the organisation. A right to data should shift the focus to providing the full and raw information, with the exception of any personal data that falls under the scope of the existing Data Protection Act. There should be a presumption in favour of publishing the full raw data unless it can clearly be shown that this is not possible (see below).
  • It is not always made clear what related data is held by the organisation, or where information has been not included in the response. Organisations should publish an open list of the data held by them internally, for what purposes, and who has access to each system, in order to allow requesters to place suitable requests in the first place.

Although the structures provided by FoI are helpful in allowing citizens access to public data, the limitations above mean that is currently a rather blunt instrument for requesting open data from organisations. The Government must therefore strongly consider bringing forward additional primary legislation with the view of setting up a similar framework for open data, or modifying the existing framework to overcome these deficiencies.

How to set transparency standards that enforce this right to data

Transparency is an end goal of the greater openness which a right to data seeks to deliver. Organisations should understand their responsibilities and duties around transparency, but a greater level of openness should also be seen as a way to deliver increased involvement of citizens around public issues, and greater levels of engagement with the bodies themselves.

Although openness itself is difficult to measure, qualitative measurements of external engagement levels and of transparency in decision making should be used to provide comparisons between organisations.

Within local government although excellent levels of transparency and accountability often exist at the executive level, less is to be found at the departmental level below that, and therefore this should be particular focal point for comparisons.

How public bodies and providers of public services might be held to account for delivering Open Data

The Information Commissioner Office (ICO) guidance provides a useful model for dispute resolution in FoI requests. The process could be similar for the new right to data, providing an option for requesters to request an internal review if they are unhappy with the handling of a particular case, followed by an external review by the ICO should this not be sufficient to resolve the situation.

Although a new body could be considered to police the system and hold organisations to account, the ICO has a great deal of experience in this area already and may prove a more effective – and cost effective – solution.

Penalties should be applicable for organisations which consistently fail to deliver on their expectations, but the design of these penalties should ensure that money is not taken away from the field of open data. For instance, if it is deemed that a financial penalty is appropriate, this money should be channelled into a central fund for other open data projects in the public sector.

How we might ensure collection and publication of the most useful data

Sites such as data.gov.uk act as a useful focus point for coordinating the release of new open data. The Requests section in particular offers a way of gauging which data sets have the most interest around them, but the current implementation contains too large a number of requests, many duplicated, and requires more active management. The ability of users to vote on other requests is key in determining the level of interest, but there is no requirement on the Cabinet Office to respond to requests when they reach a certain level of interest and more generally it is not clear how this list translates into action. This should be rectified immediately.

Although some data may be published pro-actively by organisations on sites such as data.gov.uk, this alone is not sufficient, and therefore the focus must be in giving citizens themselves the right to request any information held by any public-financed organisation as open data.

Since my experience has shown that many organisations today are not sufficiently enlightened in this field, it is necessary to examine the reasons why requests made under a “right to data” could be refused, and to mitigate against these.

Not all public data will be possible to publish in an open and machine-readable format. It may be that data exists in legacy systems which have not been designed with an open export format in mind.

However this alone should not be a sufficient reason for organisations to refuse requests. It may be possible that even where the organisation lacks the expertise or the budget to produce the raw data exports requires, that this expertise exists in other companies or organisations. Indeed, this could be used to stimulate activity in the SME IT sector if the work were offered through a public tender. Voluntary groups may also be interested in helping in situations where the commercial sector is unable to meet the challenge, and their costs could be met through a central fund where funding streams within the organisation are not available.

Only when it can be demonstrated that an organisation has done all it can to extract information from its internal systems itself and that it has also seeked external input and failed to come up with a solution, that alternatives to the original source data should be evaluated, or – where no alternatives are available – that the original request should be refused. Even in those circumstances, it should be possible for the requester to request a review of the decision should any of the circumstances change in the future (e.g. a change in IT systems).

Organisations also have a responsibility when procuring new IT systems to ensure that data is stored in open formats, or at the very least, can be exported in open formats in real time. Organisations should be accountable for this and should be able to demonstrate as part of the public tender process that they have taken this requirement into account for all new systems.

How we might make the internal workings of government and the public sector more open

Greater transparency must be recognised up-front as a key driver of the proposed “right to data”, to ensure that taxpayers are receiving best value for money and that officials are held accountable. Progress on this front should be actively monitored by central government and additional steps taken where necessary to ensure that periodic goals set by Government are met. Timelines for action should be published to allow citizens to further hold those in this oversight role to account.

How far there is a role for government to stimulate enterprise and market making in the use of Open Data

The Government and other public bodies under its control have a clear responsibility to make all public data openly available as the default option. It should not attempt to influence the open data ecosystem which remains at an early stage of development and shows considerable promise that it will develop as a world-leader in the field.

However, Government has a role to play in ensuring that the data itself is made freely available and should prioritise the release of the ‘base’ data which it holds such as mapping and weather information, which is required in order to give context to the majority of the other data sets published.

Lastly, the government can help in the longer term by encouraging the use of open standards by data publishers and in providing more general education and best practice to them.

–ENDS–

Alfresco 4.0 in Amazon EC2

Update January ’12: These instructions are now deprecated. A simpler procedure, allowing easy creation of EBS-boot images is now documented as a follow-up post.
I’ve just added a new AMI for Alfresco 4 onto my Alfresco EC2 Images list. Running these files is now even easier, based on the method used by Eric Hammond’s alestic.com, with a link next to each image that allows you to click directly through to the AWS Management Console. If you have an AWS account, you’re now just a few clicks away from launching your own cloud-based instance of Alfresco 4.0.
Of course, the usual disclaimers apply here. These are not official images in any way, and should not be used for production purposes. But if you want to try out Alfresco 4 without the hassle of managing your own install, hopefully it will be useful.
It’s also worth pointing out that the scripts I use to create these are public, hosted on the alfresco-ubuntu-qs project on Google Code.
You should be able to create your own Alfresco 4 AMIs by following these simple steps

  1. Start by running up a preconfigured Ubuntu or other Linux AMI – I use Eric Hammond’s list for the latest versions. Pick the right one for your geography and size requirements, I use the most recent 32-bit instance-store AMI from the W Europe region
  2. While the instance is starting up, download the latest Quickstart scripts from Google Code
  3. Once the machine is started, check you can connect to it via SSH, using the keypair you specified when starting the image and the username ‘ubuntu’
  4. Create a new directory named ‘ec2’ in your home directory on the running instance
  5. Use SCP or rsync to copy the quickstart scripts bundle, plus your AWS certificate and private key files (cert-blah.cert and cert-blah.pk) from your local machines. Place the script bundle in /home/ubuntu and the certificate and key files in the new /home/ubuntu/ec2 directory
  6. Back in your SSH session on the instance, extract the contents of the quickstart bundle and change into the new alfresco-ubuntu-qs directory
  7. Use the install.sh script to install Alfresco and its dependencies on the instance by typing sudo ./install.sh. For 4.0.a and 4.0.b, which do not support the DOD Records Management module, you will need to add the --no-install-dod option to the command.
  8. The script will run through and you will be prompted for a MySQL password. You must enter ‘alfresco’ unless you have changed the value of $MYSQL_USER in the script to something else.
  9. The script will indicate that it has finished installing Alfresco. Do not start Tomcat, since this will bootstrap the repository data, which you do not want to do before bundling.
  10. Change back into your home directory
  11. Create the AMI files using the sc2-bundle-vol command
    sudo ec2-bundle-vol -d /mnt -p alfresco-community-mysql-4.0.a-i386 -u 111111111111 -k ec2/pk-*.pem -c ec2/cert-*.pem -e /home/ubuntu/ec2,/home/ubuntu/.ssh,/home/ubuntu/.cache,/home/ubuntu/.sudo_as_admin_successful,/home/ubuntu/.byobu,/home/ubuntu/alfresco-ubuntu-qs,/home/ubuntu/alfresco-ubuntu-qs-*.tar -s 4096
    You must set your numerical AWS account ID using the -u flag. Also you should review the list of excluded files to ensure that you are not bundling any files that you do not want to.
  12. Once the bundling process has finished, upload it to your S3 bucket using the ec2-upload-bundle command
    ec2-upload-bundle -b my-s3-bucket -m /mnt/alfresco-community-mysql-4.0.a-i386.manifest.xml -a aid -s secret --location EU
    You must specify your S3 bucket name using the -b option, and ensure that you set your AWS access key and AWS secret key using the -a and -s options
  13. Once the upload has completed, log into your AWS EC2 web console, navigate to the AMIs section and click the Register New AMI button to register your new image. Enter the path of the uploaded manifest file within the bundle you just uploaded, this will be something like ‘my-s3-bucket/alfresco-community-mysql-4.0.a-i386.manifest.xml’
  14. Now your AMI is registered you can see if it works by creating a new instance of it. If it does, then you can safely shut down the originial Ubuntu instance as you will no longer need this.
  15. If you want others to be able to run your image then you will need to add the necessary permissions for this, using the web console.

New 4.0 features for Share Import-Export

If you’ve not come across the project before, Share Import-Export provides a set of Python scripts to export sites and their supporting data into a standardised structure based on ACP and JSON, and can also then import these definitions into another Alfresco system. As well as the scripts, a set of sample sites from the Alfresco Cloud Trial are provided to help you get started.
With version 1.3 of Share Import-Export fresh out of the door, I wanted to post a quick update on the changes in this version, which are designed to provide even more options for those early adopters of Alfresco 4.0. So you can read on for more details, or download the new version straight away.
Tag support
The first major addition is the new support for importing and exporting the tags associated with site content. Share sites just look more complete with tags populated, so this has been on the list for a little while.
Unfortunately the ACP format used doesn’t currently allow tag definitions to be embedded in exports of site content, but tag information can be easily pulled out in JSON format and this can be easily persisted to an additional file alongside the ACP.
You should see this if you run export-site.py with the --export-tags option, and I’ve also added tag data for the sample sites in the package, so you should see the definitions in the data folder too.
When importing sites with import-site.py, again the tag definitions aren’t yet included by default, but you can include these using the the --import-tags option.
The idea is that this will help when demonstrating Alfresco Share in 4.0, but it’s not limited to that version and works well with 3.4 in my own tests. So please do try this out and post your feedback.
Bootstrap your sites
The second improvement is specific to 4.0, but allows sites that you’ve exported using the scripts to be packaged up as bootstrappable components – complete with Spring configuration – inside a single JAR file, which can be automatically imported when Alfresco is started up.
The format used is the same as the Web Site Design Project site, first introduced in Alfresco Team, and also included in Alfresco 4.0, which provides the default sample content. So you can install additional users and sites alongside the out-of-the-box site, or completely replace it. (Look for the bean with id patch.siteLoadPatch.swsdp in the patch-services-context.xml file in WEB-INF/classes/alfresco/patch inside the Alfresco webapp, which you can comment out to disable the default site)

Bootstrap your own site definitions alongside the default sample site


Of course you can also continue to import sites manually using the import-site.py script, but if you’re distributing Alfresco instances to others (say a demo package used across your sales team) then the bootstrap packages can be useful to automatically import the sites when Alfresco is started up.
To package up a site in this format, you must previously have exported the site from Alfresco into the normal local structure using export-site.py. You can then point the new script create-bootstrap-package.py to this local definition and tell it the name of the JAR file to produce, containing the bootstrap components, for example

python create-bootstrap-package.py data/sites/branding.json sample-branding-site.jar --users-file=data/cloud-users.json

You should find that this builds a JAR file named sample-branding-site.py in the current directory, using the contents of the Company Rebranding site from the bundled cloud trial sites. If you have problems, or you want to know what other options, type  create-bootstrap-package.py --help for more information.
Other improvements
Lots of bug fixes and small improvements have gone into this release, based on testing across various versions of Alfresco from 3.2 to 4.0.
Download packages
If you check out the downloads page you’ll see that there’s now a choice of packages there. The full 54MB package, with scripts and full Green Energy sample sites and users, is still recommended for most users and is linked to from the home page. However, you can also grab a version with just the scripts plus user data (8MB), or with just the scripts themselves and no sample data (61kB). If you don’t need the full sample data, you might find one of these smaller packages useful.
Download Share Import-Export from Google Code