Review of AssemblySys dataServices
On a large data migration project that I am currently spearheading, we have a large installed userbase of over 2 million users running on a social networking engine. The schema has been redesigned from scratch, and code is being written to match the new schema, using the all-powerful MySQL database as the system to manage all that data.
Since this social network is global, we need good and reliable location information. The current location model is flawed and full of holes, so we have chosen AssemblySys‘ data to replace it.
We are not using AssemblySys’ schema, as we’ve rolled our own. I’ve designed our new schema to be hierarchial in nature, treating all locations on the planet as ‘nodes’ with a tree relationship, with “Earth” being the parent of all nodes. This model allows us to account for all countries and their idiosyncratic ways they divy up their adminstrative divisions, which to say the least varies a lot.
Currently AssemblySys does not have strong support for postal codes, and only about 5 countries use postal codes anyway. However, I was able to secure zip codes from a different vendor and graft them in to our location model.
The AssemblySys location database is quite through and complete, with accurate geodata for the cities. In fact, it is so complete it even lists some towns that don’t show up on Google Maps! I verified that some of these obscurities I found do, in fact, exist.
And I uncovered a good bit of curious geographical trivia, like the fact that there are 5 towns in Kentucky called “Boston”. Must be a nightmare for the Post Office there! I also found there is a town called “Philadelphia” in South Africa! At first, I thought these must be errors, but I verified that these obscure towns do indeed exist.
Next came the task of transforming their location data to our model. This isĀ where I had the most problems, because their data is not arranged in the nice, clean, hierarchical fashion our model is. In fact, it’s laid out in a very cumbersome fashion requiring a number of sub-keys to cull out the proper hierarchy.
To their credit, though, AssemblySys was quick to respond to my questions about how to access their data and shot back examples that was very helpful with the effort. But I felt their model was way too complicated than it needed to be, and perhaps could have used a bit more normalization. But I was able to do the transform after a few days of wrestling with it.
Overall, I am pleased with the quality of the AssemblySys product. I am not happy with their schema layout and the rather obtuse and complicated queries to cull out the structure. However, perhaps most users will use their database as is and perhaps it works better in that context, though the queries can get quite cumbersome from my estimation. The service is good, though completely email-based. The price is reasonable and the data is accurate.
Stumble it!


2 comments
Permalink1
Fred,
Good on you for verifying obscurities! That’s hard to do, and it shows dedication that you’re intent on making sure the data is correct — very important for a DBA.
How did you deal with the schema for the tree heirarchy? Using Mike Hillyer’s method, as in here:
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
??
Permalink2
Thanks, Sheeri!
I did use an approach similar to Mike Hillyer’s methods. Basically, there is a side table that has a list of parents and children that link back to the node table, allowing me to have children with multiple parents as well as parents with multiple children, so we can group arbitrary sets of locations into “regions” or other types of groups.
Say something