Monday, February 18, 2013

Getting rid of donut holes

I recently had a student ask about getting rid of small holes in her data. She had created a polygon feature class from a raster analysis. Some of you may have seen this happen when working with classified data or sliced data. Or, you may have other donuts made by lakes, river corridors, rights-of-ways, etc.

There are at least two ways to attack this challenge.

Method 1: If you have many (hundreds or thousands), you may want to try the Eliminate Polygon Part tool (which is only available with Advanced ArcGIS). I ran a number of tests and wasn't fully satisfied by the AREA condition of the tool, but I did like the PERCENT option. Let's look at that.

Here's my test data:
The big box with it's smaller square on the left is one multi-part polygon. The box on the right is another polygon.
Here's how I applied the tool:


Notice how I am overwriting the output (since I tested this a bunch) and that I'm using the Optional Condition PERCENT set at 5%. The result:


Here's some other results from just changing the percent amount to 6% and 8%. 



The optional condition of AREA never worked no matter what size I set it at. Is it a bug? I'm not sure. More testing is necessary. Anyone out there had success with AREA?

Method 2: If you have a few to do, you can perform this with an edit. Starting with the same test data, you can do this.

Step 1: Start editing
Step 2: Double-click the donut feature (not the hole). 
Your cursor now turns white outlined in black.

Step 3: Point at the edge of the hole you wish to delete, right-click and choose Part > Delete.

Result is below:
So there you are... Experiment and have fun!



Thursday, February 7, 2013

ArcGIS Dissolve challenge

Ragnvald Larsen has a solution to dissolve hanging when there are a lot of features. Additionally, he proposes there is a threshold for efficiency when running Dissolve. Take a look at his article. Solving the ESRI arcpy dissolve challenge

Note that his solution would work outside of Python. For instance, you may use ModelBuilder to construct this logical model.

I suspect, though, if he were to covert the shapefile to a file geodatabase, the results would be different. When you have 'big', shapefiles tend to not work well.

I've just received a great explanation of a possible cause for this and a solution that could shed some light on this. It's from Charles Convis of ESRI. I've highlighted some statements that I think are valuable. Thank you, Charles.


Hi, I'm working with datasets in the many millions of features with lots of vector processes including dissolves.  A possible source of your problem is "godzilla polygons", ie single polygons with a large number of vertices.  I would suspect this is very likely with the norwegian coastline. Godzillas will often hang and crash without informative errors. Godzillas are also common when working with data from different scales, and data that was originally hand-digitized by someone who didn't know the difference between streaming and point modes. i.e. they are more common than you think.
Here is a systematic way to deal with them:

1. Add a vertexcount field to your attribute table and calc it to !shape!.pointcount, as in:
arcpy.AddField_management(gpoly, "VERTEXCOUNT", "LONG") arcpy.CalculateField_management(gpoly, "VERTEXCOUNT", "!shape.pointcount!", "PYTHON", "")

2. open up your attribute table and sort descending on VERTEXCOUNT to get a quick summary look at your possible godzilla polygons.  Depending upon your hardware, anything over 10,000 vertices can cause problems.  Geodatabases on a higher end machine can handle 50,000 for most processes.

3. You get rid of vertices with the dice command, using the limit you determine from the exercise above and some old fashioned trial and error on your machine, as in:
arcpy.Dice_management(gpoly, gpolydice, 50000)
  Dice is analagous to the script you wrote, but rather than lowering feature counts by splitting files
it cuts large polygons up so they'll behave.   (If your script split up your files along abritrary boundaries
you would have been achieving the same effect of cutting up large polygons at the same time as you were lowering your feature counts in each file.)

4. Now your polygons should be much more amenable to all of the rest of your processes. Also you are more likely to be able to successfully run any of the other more standard polygon simplify commands that thin or generalize your linework so as to have fewer vertices.

5. In the end, a simple dissolve will get rid of your dice lines, but it's worth re-calculating your vertexcount just to make sure you didn't inadvertently create godzillas with your dissolve operations.  Godzillas are a common side-effect of dissolves.

general tips for handling problems and crashes:

6. If possible move to a file geodatabase, stability and capability is orders of magnitude greater
      than shapefiles.  7,000 polygons may stress a shapefile, but it won't make a geodatabase
      even break a sweat. I run geodatabases with 5 million features on an average PC often.

7. If possible, fire up task manager and watch your processes while they are runing. %cpu use is less informative than physical memory useage. A normal process will run along at, say, 50% ram useage with plenty of fluctuations up and down, sometimes strong fluctuations. That's normal.
The behavior of a runaway process is often to ramp up linearly and steadily with no fluctuations.
If it hits 100% and stays there you likely have a crash.   Try watching it sometime when it's
running a job you are having problems with and you may find other early warning signs.

8. As I've said several times before, problems with ArcGIS processing can more often be traced to these kinds of data issues than to faults in the software.  Sure there are bugs, but in my experience problems in the datasets themselves are a lot more common.  Also, as a general
observation,  software issues seem to me to manifest as soon as I enter the command.   Data
issues I uncover tend to show up later on during processing.

regards,
Charles Convis
Esri Conservation Program