Kessler BlogGIS: "Coincidentally" the Geodatabase, Part 2

"Coincidentally"

This is part 2 of this topic about geodatabases. Below is a review of where we're at.

The plan:

This series of blogs is going to focus on feature coincidence. To get there will take a number of weeks of blogs to fully post this series. Here's my plan:

Part 1 -- introduce the feature dataset and how to add your shapefiles

Part 2 -- introduce topology and two very useful rules: Must Not Overlap and Must Not Have Gaps

Part 3 -- introduce working with coincidence and Area Boundary Must Be Covered By Boundary Of

Part 4 -- editing Area Boundary Must Be Covered By Boundary Of errors

This information is taken in part from my ArcGIS Desktop: Geodatabase Power User class. For more in-depth detail, you can sign up for my class. More information at Web Classes.

Required software:

Standard or Advanced version of ArcGIS Desktop (it used to be called ArcEditor and ArcInfo) version 10.1. (Much of this can be done with older versions of the software. I cannot keep track of what all works and doesn't work with the older versions.) Since we will be building geodatabases in this series, Basic ArcGIS (ArcView) users don't have all the capabilities.

Topology:

In simple terms, topology is a set of spatial rules ensuring geometric relationships or how features exist in relationship to their neighbors. There are 32 rules you can work with. Here is one common rule.

Why use topology? The easiest reason to use topology is to check and improve your data. Spatial integrity is important to most users.

When you model geographic features, you may find that you want to model some features that have spatial relationships with other features around them. Countries might be modeled such that adjacent countries meet without gaps along a common border line but never overlap. States or provinces could be modeled such that they fall exclusively within one country. Streets may be modeled such that two streets always meet at an intersection and never share a segment. Bus stops may be modeled such that they must always occur on a street or certain type of street. These relationships are maintained in the geodatabase through an association called a topology.

Topologies enable richer analytical functions in your GIS. For instance, geometric networks are a special kind of topology and allow routing and upstream/downstream modeling.

Topologies also allow you to perform sophisticated editing. For instance, you can edit a county boundary at the same time as a city boundary or a road if they are in fact, coincident (which is the objective of this blog).

Why are you not using topology? (Well, in fact, you might. But, if you aren't...) Many people wait to build topology because they’ve seen or heard about involved designs taking weeks or months to build, test, and implement. They just don't have the time for an involved design.

You don’t have to wait! You can apply and capitalize on some of the benefits of topology, such as Must Not Overlap or MustNot Have Gaps. Very simple rules that help immediately and powerfully!

Certainly, your organization may want to construct a complex schema, but there is no reason not to start with a few baby steps and build up from there.

Creating Topology:

A topology is a set of properties. Right-click on a feature dataset (which you learned about in the last part) and choose New > Topology.

Name:

Each topology must have a name. Be sure to name it something that makes sense for your application.

Cluster tolerance:

Cluster tolerance is the minimum distance between vertices of features. Vertices that fall within the cluster tolerance are defined as coincident and are snapped together. This may affect endpoints, vertices along a line segment, or lines of adjoining features or feature classes.

Image extracted from the ESRI help documentation

The cluster tolerance is very important to establishing topology. It basically suggests that two points describing the same or similar features cannot exist on top of each other. For instance, a line would not have two exact points describing the shape of the line. Cluster also ‘snaps’ features together such as the figure above. In this example, the software connects these two features together and eliminates one vertex. Hopefully, this would eliminate some ‘Must Not Have Dangle’ errors.

Rank

Ranking your feature classes is very important since you may not want all your data to move during the verification of rules. A rank of 1 is the highest rank, meaning it is less likely to move. If you have two feature classes with one ranked 1 and the other 2, then the features ranked 2 will move or snap (the to maximum of the cluster) to the associated features that were ranked 1. This implies an intrinsic level of accuracy or reliability for the rank 1 data.

Bear in mind, just because you have a rank of 1 on a feature class does not mean that that data won't move. The cluster tolerance can and will move your coordinate data.

Rules

There are 32 possible rules you may use (but not all apply to all feature classes). These are organized by the types of features they work with: points, lines, and polygons. I use the chart supplied by ESRI as my first reference for rules. You can find this (if you have a local install) at

c:/Program Files/ArcGIS/Desktop10.1/Documentation/topology_rules_poster.pdf

Reading rules takes a little practice and there is more help in the esri documentation to help you with that. I suggest you also read further in the documentation and find out all you can about each rule.

The Must be Larger Than Cluster Tolerance rule: This rule is applied within each line or polygon feature class used in a topology. Vertices that fall within the cluster tolerance are defined as coincident and snapped together. Any polygon or line feature that would collapse when validating the topology is an error.

Validating Topology:

After setting rules for your topology, you will need to validate the rules. During this process two major activities take place: cracking and clustering.

Cracking happens when the vertex of one feature in the topology is within the cluster tolerance of an edge of any other feature in the topology. The topology engine then creates a new vertex on the edge (cracking) to allow the features to be geometrically integrated in the clustering process.

Clustering happens next. All vertices of any feature class that participate in a topology can potentially be moved if they fall within the cluster tolerance of another vertex. Vertices of higher-ranking features will not move towards lower-ranking features, but vertices of equal-ranked features will be geometrically averaged. For instance, end points have a higher rank than edges, therefore the edge will crack and move to the endpoint.

Image extracted from the ESRI help documentation

The combination of cluster tolerance and ranking controls your result during validation. Results may be different than shown above.

You may wish to run multiple tests changing ranking and clustering so you fully understand what will happen with your data before you implement the rules and parameters on your full database. If you choose to do this, make a copy of your data before you start! Validation can not be reversed – at least in ArcCatalog!

I once had to create a dataset of all the roads for 5 counties in Washington. They needed to be "topologically correct" in that all roads must connect correctly, there was to be no overlaps, and a few other basic needs so that transportation modelling could occur. The roads for each county were in various states of "cleanliness". Suffice it to say that they were NOT clean. I had intersections that did not intersect, county boundaries that "shifted" the data (as they were collected by different people and methods), I had duplicate lines, I had overshoots, and I had many other small errors. All these roads were segmented in that they started and stopped at each intersection. I had a couple hundred thousand road segments. Validation took about 150 hours! Now this was a while back when I didn't have as powerful a machine. The point here is that this process may take a huge amount of processing time.

There are ways to help organize and reduce your work when validating a topology, but that's beyond the scope of this blog. Read the documentation, test your ideas on SMALL sets of data, think about your goals and critically evaluate your results. I can tell you by experience, I've been very surprised by some of my results.

Errors:

Validation finds errors. An error is a violation of a rule.

An exception is an error that you have determined should be allowed, such as Roads typically have lots of dangles so you may wish to ‘Except’ them. The exception might be a cul-de-sac or a road that actually dead-ends.

Dirty areas are regions surrounding features that have been altered by editing after the initial topology validation process and require an additional topology validation. This additional verification may well create new errors which you may need to take action upon.
All of these help you keep track of and identify ‘issues’ in your data. Just remember, an issue may not be a problem or even a data error.

Visualizing Errors:

Here are the properties for an example topology:

Note the cluster tolerance happens to be very small. This is the minimum distance that can be used and it's automatically calculated by the software. You can increase the cluster tolerance, but why in this case? Keep the data as precise as originally collected.

When applying rules to a single feature class, ranking is irrelevant.

I've applied two simple rules that I feel are basic for most parcel feature classes. Parcels need to be full coverage and non-overlapping.

Okay, so now let's look at some results.

Above, shows an error highlighted in pink that goes around the exterior of all the parcels. This error always occurs with the Must Not Have Gaps rule. If you look closely in the middle of the densest parcels (upper middle), you will see a pink spot.

Zoomed into that spot, you can see very easily that a parcel is missing. This is an error related to the Must Not Have Gaps rule. It's very easy to locate gaps in your data, particularly when you have 1000's of polygons to look at.

Zooming into the pink blotch towards the lower right of the parcels, you will see a lot of parcels not only outlined but filled in as pink. These happen to be exact duplicates of parcels. What likely happened here is that this was an area of development that got updated and an insert was performed twice. You would never be able to visualize this issue when looking at the parcels, whether or not you filled them in with a solid color. Additionally, there seems to be a line problem, yet we are working with polygons. In the next part of this blog, we'll take a look at this error, and others to see what is going on.

Errors can be visualized a number of ways. You also get a report of all the errors to help you sort out what is happening. There's not enough room in this blog to go into these other tools you have available. Again, this topic is a subset of the my ArcGIS Desktop: Geodatabase Power User class. For more in-depth detail, you can sign up for my class which is coming up in a couple of weeks. More information at Web Classes.

Notes About Topology:

A topology must be in a feature dataset (FDS)
All feature classes involved in a topology must be within the FDS
A FDS may have many topologies
A feature class can be involved in only one topology
A feature class may have many rules applied to it within that topology
All points, lines and polygons have the potential to move.
Just because a feature class can only be involved in one topology doesn’t mean that it can’t be involved in multiple rules.
When creating or validating a topology, an exclusive lock is required (since this is a schema change and will likely be moving data). An exclusive lock can only be acquired if no other locks—shared or exclusive—are already on the data. If there are already other locks on the feature class or table, ArcCatalog will not be able to establish its exclusive lock, and the schema will not be editable. Once an exclusive lock has been acquired, no shared locks can be applied, so the data will not be accessible in ArcMap or ArcCatalog. A common ‘shared lock’ that will prohibit the construction of a topology is an ArcMap document open and ‘reading’ the data

I need your help:

This sounds really silly, but I love the beginning image and having Dr. Evil and Minnie Me for the first two posts, but now I'm running out of ideas for the opening image! Any ideas for part 3 and 4?

As always, I appreciate your comments, suggestions, and even spelling correcetions!

Happy Geodatabasing!

Kessler BlogGIS

Tuesday, May 7, 2013

"Coincidentally" the Geodatabase, Part 2