Vishful thinking…

ArcObject, PostGIS and spatial operations performance

Posted in GIS by viswaug on December 19, 2008

 

I have caught myself cursing at ArcObjects numerous times for not being performant for some of the spatial operations I had done using it in the past. I have also heard the same about it from other developers I have come across in the past. But what I was always curious about was, how does ArcObjects compare in performance to the other open-source spatial libraries like PostGIS? Since I have always been swimming in the ESRI and .NET ocean, I have little or no knowledge on how the PostGIS and Java worlds performed. So, it was interesting to listen to John Coryat speak this thoughts on how using the PostGIS library (that sits on top of the PostgreSQL database) makes spatial operations up to 10 -100 times slower than if those operations where performed using the spatial operations built into Postgresql.

Both ArcObjects and PostGIS suffer from the same drawback that make them significantly slower, they both are spatial libraries that run outside of the process space of the database where the spatial data is stored. Please see comments from Paul Ramsey below explaining that PostGIS actually runs within the process space of the database and how recents changes have made it considerably faster. ArcObjects runs outside of the process space of the database where the spatial data is stored.And that takes a major toll on the performance. Most databases these days including MS SQL Server 2008 pack a powerful set of spatial operations built right now that can satisfy the requirements for some cases but definitely not all of them. The spatial operations that are built right into the databases should always be prefered over using external spatial libraries since they run magnitudes faster. But the problem is that spatial operations included in those databases are not comprehensive and they all don’t pack the same geometry operations either. Most of them do support the SQL functions outlined by the OGC like SQL Server 2008 but some don’t. The most notable memberwould be MySQL.

To overcome or to add more capability to the spatial operations in these databases, an external spatial library become a necessity. Expecting the database vendors to develop spatial operations and to have to them keep up with the developments in the industry would not be an attractive option. This puts us, the spatial developers, in a catch-22 situation wherein we need to give up on performance to gain more powerful spatial capabilities. So, we may not be able to FIX the performance issues with spatial operations but just patch it as needed.

Even though John Coryat talks about his experience with PostGIS and PostgreSQL performance comparisions, that still doesn’t give a comparision between ArcObjects and PostGIS performance. It would be interesting to see some performance benchmarks between the two though.

2 Responses

Subscribe to comments with RSS.

  1. Paul Ramsey said, on December 20, 2008 at 4:01 pm

    First of all, PostGIS and ArcObjects do not suffer from the same drawback (being run out-of-process) since PostGIS is definitively in-process in PostgreSQL (as Oracle Spatial is in Oracle and SQLServer Spatial is in SQLServer). To demonstrate (destructively), if you can find a PostGIS function that causes a crash, you’ll note that the PostgreSQL back-end doesn’t survive the crash, they are in the same process, using the same memory pools. You are right, that ArcObjects is its own separate process, attaching to the database — take down ArcObjects, the database continues, and vice versa.

    There are very few things like PostgreSQL can do, natively, that count as interesting spatial operations, so my guess is that Coryat was doing a simple point-in-polygon test. Indeed, at the time this video was recorded, the implementation was probably via GEOS, which I found in profiling did involve (a) an enormous amount of memcpy’ing and (b) building up far more topological information than is strictly needed to return true/false on a point-in-polygon test. So for simpler cases, like point-in-polygon, we have removed the GEOS dependency.

    The current version of PostGIS will probably blow the socks off Coryat’s test for point-in-polygon, since it is now implemented natively in PostGIS, rather than delegating to the GEOS library (one hunk of memcpy’ing avoided), it uses an internal caching/indexing scheme to make the point-in-polygon tests cost O(log(n)) instead of O(n), and it avoids deserializing the whole candidate geometries on every function pass (more memcpy’ing avoided).

    In general even if the idea “PostgreSQL is faster than PostGIS” were correct, the practical utility of the observation would be limited: there is just so little that native PostgreSQL can do spatially. And for most areas of functional overlap (bounding box search, point-in-polygon tests) there is no also now no performance difference. There’s also many simple things native PostgreSQL cannot do: objects larger than 8KB, objects with holes, aggregate geometry, spatial selectivity estimates, etc.

  2. Steven Citron-Pousty said, on December 23, 2008 at 1:55 am

    You will never see those benchmarks since ESRI, like most closed source vendors, forbid benchmarks as part of their EULA.
    You could do it internally and just tell the world which you found to be faster, but you could not benchmark and publish the results.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: