Vishful thinking…

TableauShapeMaker – Adding custom shapes to Tableau maps

Posted in Uncategorized by viswaug on June 29, 2015

Hello fellow map herders, I recently wrote this little utility that can convert shapefiles into a CSV format that is consumable by Tableau. Tableau has a collection of built-in shapes that are sufficient for most mapping needs inside Tableau. But sometimes, there are valid reasons for wanting to add custom lines or polygons on to the Tableau map. This utility processes a shapefile in any coordinate system and optionally simplifies the shapes to reduce the number of vertices being displayed and outputs a CSV file that can be consumed by Tableau and blended with other data sets to produce the required visualization.

-h, -?, –help Show this message and exits
-i, –input=VALUE (Required) Input shape file in any geographic or projected coordinate system
-o, –output=VALUE (Optional) Output CSV file name
-t, –tolerance=VALUE (Optional) Tolerance value to use for simplifying the shape.

Example command

TableauShapeMaker.exe -i “C:\VA_JURIS.shp” -o juris.csv -t 0.001

The utility can handle displaying multi-part polygon/line shapes like the Hawaiian islands properly. But the utility cannot handle converting holes in polygons. This is due to the fact that I could not find information on how to represent them in the Tableau format. If anyone can help me with the representation for holes, I can add that feature to the utility also.

To display the shapes in the generated CSV file on the Tableau maps, add the CSV file as a data source in Tableau. The “Latitude” and “Longitude” fields should be identified as “Measures”, if not please mark them as such. The “Point ID”, “Polygon ID” and “Sub Polygon ID” fields should be identified as “Dimensions”. Double click the “Latitude” and “Longitude” measures to add them into the row and column shelves. Select “Polygon” from the “Marks” drop down. Drag the “Point ID” dimension to the “Path” shelf followed by the “Polygon ID” dimension to the “Detail” shelf and the “Sub Polygon ID” dimension to the “Detail” shelf. This should display the shapes on the map. To color the shapes by a certain dimension, say “Name”, drag the “Name” dimension to the “Color” shelf.

Since, holes are not supported by TableauShapeMaker, you will need to ensure that the smaller shapes are on top of the larger shapes ONLY IF they can be on top of each other. The shapes drawing order can be controlled/rearranged by dragging their legend entries to the right position in the legend list.

Obligatory screenshot…


Download TableauShapeMaker

Even though talking about ‘big data’ is the hipster thing to do, let’s talk about ‘large data’ (spatial)

Posted in GIS by viswaug on July 25, 2012

First things first, ‘large data’ is something that I made up for lone sake of discussing spatial data where the individual shapes in every record are by themselves huge :) . Data of this sort pose their own unique problems, even though, these issues are solvable with a little bit of ingenuity. The ‘big data’ problem is much harder to solve and seems to be getting a lot of attention lately.

Some examples of this data is US state boundaries, county boundaries etc. The size of the individual state shapes in the US state boundary dataset that has not been generalized (or simplified) can by themselves be formidable. The size of the state shapes is generally much bigger for the coastal states than for the land locked states. In one of the states boundary datasets I am working with, the size of Alaska’s state shape serialized to JSON by itself was close to 5 MB. Pushing such large data as-is to the client will just slow down the loading and rendering of web maps. And there are also other datasets like eco-regions etc where the individual region shapes are just the union of a multiple state shapes.

As you can guess, the size of the eco-regions shapes are also rather large. Such datasets create problems on multiple fronts. Because of the sheer size of these shapes, they need to generalized before they can be sent to the client (browser) for display purposes. Generalizing such shapes will use up a chunk of your CPU time, so these generalized shapes really to be cached to allow your app to be scalable. Caching such static datasets with such large shapes shouldn’t be too much of an issue but it is something that needs to be additionally done to handle the largeness of the shape :)

Another issue posed by the large datasets is with respect to the spatial operations that need to be done using them. Spatial indexes in databases make simple spatial operations like point-in-polygon really fast even against a large and complex polygon shape. But calculating the intersections of two or more large and complex polygons can still be a CPU intensive operation that takes a while. For example, if we have a couple of large datasets like let’s say eco-regions and watersheds, finding all forest stand areas that lie within eco-region ‘A’ and watershed area ‘2’ can still be a time-consuming operation as determining if an area completely lies within other large shapes is more time-consuming than checking if it intersects. And for more complex spatial operations that span across multiple large datasets, the number of CPU cycles needed increases drastically.

Most databases these days like MS SQL Server, Oracle, PostGIS all come packaged with really great spatial data types and spatial operation capabilities that makes the lives of application developers much easier. Saying that these databases have greatly simplified things would actually be an understatement. For transaction processing applications, the databases should be highly available, should be very fast, scale and support multiple concurrent users. But here comes the dilemma, if we perform such taxing spatial operations in the database, the spatial operations will use all the CPU cycles and keep the CPU utilization at 100% for the duration of the spatial operation. This means that other requests coming into the database will either be processed slower since they have to wait for the spatial operation to free up some CPU cycles or just timeout because the database has been unavailable for too long. This is not a good thing to have happen to an online transaction processing database. One other huge culprit here is also the serialization or de-serialization of large geometries into WKT/JSON/WKB to get shapes in and out of the database. Some databases MS SQL Server provide CLR data types that eliminates the need for serialization/deserialization, for other databases you might be out of luck based on the technologies you have to use. So, yes these ST_GeomFromText, ST_GeomFromGeoJSONST_AsGeoJSON are amazingly useful, but it is good to be aware of the side effects.

With good caching solutions, a lot of the problems described here can be alleviated, but they are problems nonetheless and don’t make your life easier :) Let me know your thoughts…

Something to consider before using a relational storage as a service for your app

Posted in Uncategorized by viswaug on July 21, 2012

If you are planning on using any one of the great services available these days that let you store, retrieve, update and delete (perform CRUD operations) your relational data (geographic or not) using a HTTP API, here is something that you might need to consider. There are multiple services that allow you to do this these days. Some of these services are Google FusionTables, ArcGIS Online, CartoDB, ArcGIS Server 10 and up etc. Even though the backend storage for Google FusionTables is not relational, the service presents a relational facade to user by allowing tables to be merged etc.

All these services allow you to perform CRUD operations on your data over HTTP. This makes things really easy for front end developers since they can perform CRUD operations right from javascript and maybe just using the web server as a proxy relaying requests to the storage service.

But one of the things it takes away from us is the ability to perform multiple CRUD operations as one atomic unit. That is, let us assume that we have a polygon layer in our ArcGIS Server map service which also contains a table which contains attribute data related (one to many) to the polygon layer data. In this case, if one of our requirements is to update attribute data in a single polygon layer table row AND one or more rows with related data in the standalone attribute data which either succeed or fail together as a unit, then this becomes very hard to accomplish (if it is possible to do so). That is, we could end up with cases where the polygon layer table does get updated, but the update operations on the related table fails due to one of many possible reasons. This may allow bad data to be stored in the database with loss of data integrity.

Non-relational storage services are more resistant to this drawback since the data structure design for those databases expect us to handle this during design based on how the app uses the data being stored.

If you have any thoughts or suggestions on this issue, please leave a comment.

Using client side routing in web mapping applications

Posted in GIS, javascript by viswaug on February 7, 2012

Thought I would share how I have been using client side routing in a web mapping application that I have been working on for a little while now. The application is coming along nicely and I have been having quite a blast writing CoffeeScript for the app. If you haven’t looked into CoffeeScript yet, please consider doing so. I found it to be very very helpful.

But CoffeeScript is not the point of this post. There are a lot more resources out there on the web to help you pick up CoffeeScript. Client side routing is something that I have been doing with backbone.js. backbone.js is a neat little library that helps users build cleaner web apps by encouraging the MVC style development in web pages. backbone.js might seems a little intimidating at first but if you stick with it, you will find it very helpful. Although, I wouldn’t suggest using backbone.js for simple web pages. It is mainly targeted towards single page web apps where you are writing a lot of javascript and updating only some sections of the web page based on ajax requests.

Speaking of single page web apps, i think a lot of web based mapping apps fall in that bucket, especially since a lot of GIS users expect to see all their favorite ArcMAP functionality in their web apps also. That is a battle that I have been fighting for a while now and have a sneaky suspicion that there is a long way to go. Although, I should mention that I have been able to push a ‘workflow driven app’ over ‘map driven workflow’ a little bit further along than I have been able to do in my earlier projects. The feedback so far has not been too shabby either. As far as I can tell, no user has reported the ‘Map Toolbar’ missing :)

The client side routing techniques uses the hash tag to refresh sub sections of a web pages optionally thru AJAX and without the need for a full page postback. Ok, so, what does all this stuff have to do with web mapping apps? Well, I am leveraging the client side routing technique to allow users to select features on the map via the URL. Hopefully, the screenshots below will shed more light on it

The above screenshot shows the URL in the browser address bar reflecting the selected feature on the map. The URL in the address bar reads “…/StewardshipPlan/18701/Stands#/Stand/1904”. The initial “…/StewardshipPlan/18701/Stands” portion of the URL indicates the page that is currently loaded by the browser. The page contains the map that the user can interact with. The latter “#/Stand/1904” portion of the URL is the client side route and allows backbone.js to call our javascript that triggers the selection of the feature on the map. The selected feature on the map and the client side route of the URL in the address bar is synchronized. That is, the user can type in the URL in the address bar and the page will open up with the stated feature selected on the map, or the user can browse around the map and select other features by clicking them and the client side route will update to reflect the current feature selection. Here is another screenshot with another feature selected

That is a neat trick, but what does it buy me? Well, for starters, you can email around URLs for features to the people you are collaborating with and when they browse to the link, they will arrive at the specific feature itself on the map and with it’s attribute window open and ready for editing it needed. Also, not sure how useful this is, but if you want to go back to your previously selected feature, just click te browser back button. But as a developer, building my app this way helped me get rid of a lot of gluey jquery event binding to button click events and replace them with just links on the page which trigger the right javascript  via client side routing. Also, it almost forces me to write modular javascript which is very important for maintenance.

I have taken this line of thought a little further out and have enabled our split and merge functionalities to be routable too. Check the screenshots

The Merge and the split URLs perform the right feature selection and enable all the right map tools when the user navigates to the URL shown in the screenshots.

This is the first time I have used client side routing to build web mapping features like the ones described above. So far, i would say that it has worked out pretty good from a developer’s perspective. Any thoughts/suggestions about it is welcome and would be appreciated.


Posted in .NET, ArcGIS, C#, ESRI, GIS, Utilities by viswaug on June 28, 2011

I also created this bare bones MBTiles cache viewer to view the tile cache in MBTiles format. This application does not do much, just display the tilecache on a map. To view a MBTiles file, just fire up the viewer and start dragging and dropping MBTiles cache files on to it. You can drag and drop multiple MBTiles cache files at the same time if needed. And also, you can create MBTile caches with the TileCutter. The viewer was created using the ESRI WPF map control. If you would like to see this viewer do more, let me know :)

MBTilesViewer can be downloaded here.

TileCutter update – With support for OSM and WMS map services

Posted in .NET, ArcGIS, C#, ESRI, GIS, OpenSource, Utilities by viswaug on June 28, 2011

I just added support for creating MBTiles caches for WMS map services and also to download OSM tiles into the MBTiles format. MBTiles cache for WMS map services would improve map rendering performance, but why did i add support for OSM tile sets? Well, they will come in handy for disconnected/offline use cases. So, here are some usage examples

ArcGIS Dynamic Map Service:

TileCutter.exe -z=7 -Z=9 -x=-95.844727 -y=35.978006 -X=-88.989258 -Y=40.563895 -o=”C:\LocalCache\ags.s3db” -t=agsd -m=””

WMS Map Service 1.1.1:

TileCutter.exe -z=7 -Z=9 -x=-95.844727 -y=35.978006 -X=-88.989258 -Y=40.563895 -o=”C:\LocalCache\wms111.s3db” -t=wms1.1.1 -m=””

WMS Map Service 1.3.0:

TileCutter.exe -z=7 -Z=9 -x=-95.844727 -y=35.978006 -X=-88.989258 -Y=40.563895 -o=”C:\LocalCache\wms130.s3db” -t=wms1.3.0 -m=””


TileCutter.exe -z=7 -Z=9 -x=-95.844727 -y=35.978006 -X=-88.989258 -Y=40.563895 -o=”C:\LocalCache\osm.s3db” -t=osm -m=””

And always just type “TileCutter -h” for usage information.

Want to customize the parameters with which the maps are being generated? Just use the “-s” command line option ans specify the setting in a query string format.

TileCutter.exe -z=7 -Z=9 -x=-95.844727 -y=35.978006 -X=-88.989258 -Y=40.563895 -o=”C:\LocalCache\ags.s3db” -t=agsd -m=”” -s=”transparent=true&format=jpeg”

Also, if some of the tile requests result in errors, the level, column, row and error message information would be logged into a text file in the same directory as the MBTiles cache.

And now, for the best new feature of TileCutter, the program does not store duplicate tiles. That is, if the area you are caching has a lot of empty tiles in the ocean etc, the MBTiles cache created by TileCutter will only store one tile for all those duplicated tile images. Should help save disk space :)

TileCutter can be downloaded here

TileCutter – A small utility to generate tile cache in the MBTiles format from ArcGIS Dynamic Map Services

Posted in C#, ESRI, GIS, Uncategorized by viswaug on June 12, 2011

Thought I would share a little utility I had written up to generate tile caches in the MBTiles format for ArcGIS Dynamic Map Services. The MBTiles cache format is very simple and makes moving caches between machines very easy since you just have to transfer one file instead of the thousands of files that need to be copied for normal tile caches. The TileCutter is a console utility and accepts the scale range and the extent in latitude/longitude for which the cache should be generated. It also takes a few other options listed below.

  -h, --help                 Show this message and exits
  -m, --mapservice=VALUE     Url of the ArcGIS Dynamic Map Service to be
  -o, --output=VALUE         Location on disk where the tile cache will be
  -z, --minz=VALUE           Minimum zoom scale at which to begin caching
  -Z, --maxz=VALUE           Maximum zoom scale at which to end caching
  -x, --minx=VALUE           Minimum X coordinate value of the extent to cache
  -y, --miny=VALUE           Minimum Y coordinate value of the extent to cache
  -X, --maxx=VALUE           Maximum X coordinate value of the extent to cache
  -Y, --maxy=VALUE           Maximum Y coordinate value of the extent to cache
  -p, --parallelops=VALUE    Limits the number of concurrent operations run
                               by TileCutter
  -r, --replace=VALUE        Delete existing tile cache MBTiles database if
                               already present and create a new one.

Example Usage:

To just try it out, just run TileCutter.exe, it will download tiles for some default extents from an ESRI sample server

TileCutter -z=7 -Z=10 -x=-95.844727 -y=35.978006 -X=-88.989258 -Y=40.563895 -o=”C:\LocalCache” -m=””

The TileCutter does request tiles in parallel threads and allows to you controls the number of concurrent operations. I don’t want to write too much about it yet since i am still working to add more capabilities to it. I am planning on providing the ability to generate MBTile caches for WMS services, OSM tiles etc in the future and more if there is interest. Also, planning on implementing  a way to avoid storing duplicate tiles(for example, empty tiles in the ocean etc). I just wanted to get the tool out there early to get feedback and guage interest. So, if you have queries/interest/feature requests, let me know :)

Also, I will blog about a little piece of code for a IHttpHandler that will serve up the MBTiles to web clients pretty soon. Stay tuned…

TileCutter can be download here.

Moving away from the javascript module pattern

Posted in javascript by viswaug on March 2, 2011

A while ago I had written about using the javascript module pattern to organize code a little better. But off late, I have moved away from using the module pattern because of some reasons which i will outline below. But the pattern to use is really a matter of preference in my opinion. I have now gone back to basics and am using the javascript prototype pattern and have been loving it. Here is why i made the switch

  • the ‘this’ keyword in my class method means what I really expect/want it to mean
  • creating multiple instances of my class doesn’t consume more memory for creating more function instances since all instances use the functions on the prototype
  • the ‘instanceof’ operator can be used to determine if any object is an instance of the class

Here is a simple example illustrating how to create a class. The snippet below creates a class called ‘MyClass’ with the ‘getOption’, ‘calculatePay’, ‘getDisplayPay’ methods.

function MyClass(options) {//constructor
  this.options = options;

MyClass.prototype.getOption = function(name) {
  return this.options[name];

MyClass.prototype.calculatePay = function(hours) {
  return hours * this.options['hourlyRate'];

MyClass.prototype.getDisplayPay = function(hours) {
  return this.getOption('name') + " - " + this.calculatePay(hours);

To create an instance of the class above

var inst = new MyClass({‘name’ : ‘Jeff’, ‘hourlyRate’ : 1000});

Also, the following works too

inst.constructor === MyClass; //returns true

if( inst instanceof MyClass ) //evaluates to true

Pretty simple and sweet. Here are a couple of examples of classes written like above



I am also using the standalone YUITest for unit testing javascript which doesn’t require the YUI framework which looks more attractive in YUI3. It was a very close call between YUITest and QUnit for the unit testing framework. I went with YUITest because it came with Selenium drivers

YUITest example for g2kml

YUITest example for g2geojson

Digging a little deeper into Google Fusion Tables – A technical GIS perspective

Posted in GIS by viswaug on February 14, 2011

Before I start getting into too much details about Google Fusion Tables, I will provide a link out to Google Fusion Tables. If you haven’t heard/played around with it yet, i would strongly encourage that you take a little time to do so. It will definitely not disappoint you and is definitely worth the time. It is still in ‘Beta’, but it is looking good.

The ‘Map’ and the ‘Intensity Map’ visualization of the table data should be of special interest to all the GIS folks. It makes the process of mapping data real easy. The ‘Location’ field type in Fusion Tables supports both street address strings and KML string representation of geometries. The street addresses entered into the location field get automatically geocoded and are viewable on the map visualization.

  • Even though the documentation doesn’t explicitly state it, the Location field supports the ‘MultiGeometry‘ representation in it’s Location field alongside the Point, LineString and the Polygon representations
  • This might be pretty obvious, but it supports only the WGS 84 Geographic coordinate system just like KML.

The map visualization also currently simple thematic rendering of maps based on certain column values. Some of the documentation on how to acheive it and available options are not very easy to find. So, I thought a link to the documentation might help later. The list of the available map markers are here. There are also a very good collection of publicly available data on Fusion Tables also. Check out the USA State and County boundaries here. There is also a wealth of other information out there already on Fusion Tables that are publicly available and can be easily location with a simple search. You can also upload ‘.csv’, ‘.kml’, spreadsheet files to Fusion Tables. The ‘ShpEscape‘ site also allows us to upload ‘.shp’ files to Fusion Tables. Once uploaded, data can simply be shared via an URL rather than emailing ‘.shp’ files around as attachments. Government agencies are also taking to Fusion Tables. Check out data from USDA NRCS, State of California and Natural Earth Vector data. I am hoping that the list gets bigger. Apart from just sharing data, it allows us to easily apply different chart visualizations to the data to glean useful trends and analytics for better informed decision making. Pretty powerful tools to unleash the power of data.

Fusion Tables also allows us to merge/join two tables in Fusion Tables based on a shared key. Fusion Tables also allows us to create views from base tables where only a filtered list of rows or columns are visible.

The views feature in Fusion Tables enables us to set user permissions based on columns and rows. To accomplish this, keep the base table private and create views that display only a filtered set of rows or columns. Now, the views alone can be shared with users. Users get access to different views as per their permission set.

The Google Fusion Tables also provides a simple and powerful API over HTTP to administer and manage your data in Fusion Tables. Public tables can be easily managed via simple HTTP requests to Fusion Tables identifying the table. Private tables can also be managed pretty easily using OAuth authenticated requests. The API does have some missing features also. The major one being that Fusion Tables does not support ‘OR’ queries. This missing functionality arises from the fact that Fusion Tables is built on top of Google’s DataStore.

  • Views cannot be created from Merged tables, but only from base tables
  • The resulting data from querying data from Fusion Tables is a comma delimited list of field values. Text column values are not normally not inside quotes unless they contain commas as a part of their field value. The Location field values are returned as KML string representations and they can contain commas in them. So, beware of this return format which throws a monkey wrench into the code needed for splitting the field values from the Fusion Tables response.

Google maps api v3 also supports displaying Fusion Tables data as overlays. The maps api pop-up bubble can be customized via Fusion Tables using any built-in templates or by providing custom HTML templates. The data being displayed on the map can also be filtered by providing a query string.

  • Note that the Fusion Tables query used in the maps api does not like queries of the type ‘Select * FROM’. It doesn’t like the ‘*’ and requires a column name to be specified
  • All Fusion Tables layers on the map get drawn on the map as a single overlay. That is, even if you have 10 Fusion Tables layers added to your google map, the api does not make ‘n’ tile requests for the 10 layers individually making the number of images being requested n*10, but the api only request ‘n’ tiles for all the Fusion Tables layers. This is just like how KML layers are handled in the maps api.

Fusion Tables does cluster your data into points on the map automatically at high scale levels. There are also some data serialization limits built-in to the Fusion Tables API. There is currently no way to display private Fusion Tables data overlayed in the google maps api v3. But that feature is supposed to be coming for google maps premier customer. I have submitted a list of Feature Requests (see below) with the Fusion Tables team, please star them if you would like to see them also.

That said, Fusion Tables is teh awesome.

Well Known Text (WKT) representation for MultiPoint

Posted in GIS, SQL Server 2008 by viswaug on February 13, 2011

Adding to my experience with Oracle from the last post. Turns out that the ‘Well Known Text’ (WKT) representation of MultiPoint geometries differs between Oracle and MS SQL Server,MySQL. Consider the ‘STMPointFromText‘ method in MS SQL Server, the representation of the MultiPoint geometry it expects is like the example shown below.

MULTIPOINT(-122.360 47.656, -122.343 47.656)

As you can see, the Latitude & Longitudes are separated by a space and the coordinate pairs are separated by a comma and that’s it. Here is how Oracle expects the WKT for a MultiPoint geometry to be represented.

MULTIPOINT((-122.360 47.656), (-122.343 47.656))

Oracle expects the coordinate pairs to also be enclosed with parenthesis. Wikipedia seems to agree with the Oracle representation also. Apparently, the initial OGC specifications were not clear and the community started using the first representation for MultiPoint geometries. Some of the .NET GIS libraries I work with use the first version of the MultiPoint geometry WKT. But OGC has clarified the specifications and accordingly, the second version used by Oracle is the correct one. MS SQL Server and MySQL use the first version of the MultiPoint geometry WKT. This has turned out to be a pain for us and probably for other gis developers out there also.


Get every new post delivered to your Inbox.