Managing Regional Spatial Data
Managing Regional Spatial Data
I had a bit of time to think about how regional GIS data was managed, in the context of a national system. That is, how national data sets, that are split between regional teams, were best controlled and potentially, eventually, merged – or at least, conceptualised as a single data set. Some time ago I wrote a requirements document on how a regional data store system could work. There are a couple of bogeymen at work here:
- Some regional data sets are captured to local data standards (i.e. locally decided attributes, locally decided positional accuracy)
- Some regional data sets are captured to national data standards
That’s just the data side. Add to that the presentation picture:
- Sometimes, users want to see regional portions of a data set
- Sometimes, users want to see the whole national picture, and expect seamlessness around regional boundaries.
This does, indeed, cause problems. Clearly, national standards are the way forward with this, but aren’t always available, or haven’t been decided yet, or, more than often the case, regional groups have done it ‘their way’. There are obviously a number of choices here:
- Make all the data sets conform to a nationally agreed standard post-event, which might be a subset of the ‘best bits’ of each
- Start again with the national standard that you really wanted everyone to capture too
- Don’t bother with standards
If you were to choose the latter, it might just be business-reasonable, if it’s a low-value data set. However, this implies that you don’t want to see the data in its seamless entirety, or that you’re happy with a cartesian join across all regions that might just give you a massive (and potentially duplicate-bearing) attribute set.
Don’t forget edge matching
And then of course, there’s the seamless boundary creation. If data has been created by different folk at different regions, you’re bound to get some edge match issues. Of course, you’ve got your regional boundaries as templates, so they’re going to be useful to ensure you clip the edges to where they’re supposed to be. And you might choose to extend certain features to the boundary where required, so as to avoid gaps. Gaps like, where linear features don’t match, but you have to take a strategy on mid-point-join, or go with one region over another.

The practical option is, in this case, the conform-to-best-shot case. After all, that’s what data cleansing is about. I’d probably go as far as saying that 1Spatial’s Radius Studio is probably your best friend on this one, because, if you’ve got more than one copy that you want to do some spatial checking on the way in, you’re probably going to go down that route.
Is it justifiable? Well, again, it depends on the value of the data set. If you’re going for something that needs to be nice and clean (as it should – don’t forget we NEED data quality out there), you should be coming up with a master schema that takes in the best elements of each – perhaps with a bit of massaging- and a bit of edge cleaning.
But voila, you can come up with a great number of data sets that look like one person managed the whole country load in their spare time.

I think there a three key factors to data quality and understanding…the data model (including the rules by which you allow you data to be captured), applying those rules at the time of capture and capturing per feature metadata also.
Get these right and you are set. But then that’s quite easy to say and another thing to actually do!
The first issue (the data model and validation rules) is a GI science issue, the other two ate largely GI system issues. I think where most people go wrong is focusing on only one or two of these areas.
There’s definitely need for technical solutions, but ‘there will be trouble ahead’ without some good fundamental data model design work too.
I wonder how issues of data quality and metadata will fair in these economically challenged times
Laine
Laine
10 Jun 10 at 6:39 pm
Laine, definitely. Getting the spatial side of things right it important, but yes – the data model itself is very, very important too. At an attribute level, where regional data mismatches across boundaries, or – worse still – an implied match occurs, it reduces the overall value of the data: it becomes less than the sum of its parts. As an example, if a field is named the same thing across two regions, but actually contains semantically different data, the end result will be not just incorrect, but may lead to wrong decisions.
In that respect, it’s important to get the data model right in the first place. However, in the case where data has already been captured and requires a ‘central merge’, it’s important that metadata provides clear semantics on what the data mean.
As you say, let’s hope metadata continues to be captured and seen to be important. If it fails, it’s easy to lose value of data.
stu
10 Jun 10 at 6:51 pm