Business GIS for Everyone
This blog takes an informal look into the
debates and methods related to
business GIS
and mapping
Translating Big Analysis into Big Understanding (and Big Dollars): Part One
Author: Dr. Murray Rice
Advancements in GIS, data mining, and geoanalytics have brought a
wealth of new, powerful, data-based methodologies to a wide range of
small and medium-sized businesses who were totally excluded from
accessing such power even a decade ago. Advanced multivariate
statistical methods take complex datasets and extract core insights
that could not be seen in any other way. The downside of this
explosion of multivariate (and often spatial multivariate) tool
development is that the complex datasets that serve as the input to
the process often lead to complex, difficult to understand output as
well.
Business GIS plays a role in cutting through some of this
complexity and access core market insight without involving complex
tables or unintuitive graphics. Coupling the communication and
conceptualization power of Geography with the analytical power of
multivariate analytics provides capabilities for an ever-widening
circle of professionals who are not statistical analysts the ability
to generate and interpret complex results.
The growing field of spatial segmentation is a prime example.
Spatial segmentation involves bringing dozens of variables into
analysis simultaneously, producing a complex but understandable
picture of the structure of local, regional, and national markets.
If we can add data from a business’ own operations to this mix, we
can set up a framework where deep and useful insight can be gained.
Components of a Spatial Segmentation Analysis
There are two distinctive phases of work needed to produce a
spatial segmentation analysis. These include:
- Generation of the overall segmentation framework.
This is the biggest and most challenging component of the
analysis where every market segment is identified across the
country. This sort of wide-ranging analysis needs to be done
once, yielding a segmentation framework that can be used and
reused multiple times. Commercial data analytical firms such as
Caliper do this basic work and provide it to their customers,
who then have what they need to complete a second stage.
Spatial segmentation works to make sense of complex
datasets derived from any of many sources. Two major dataset
foundations for spatial segmentation are demographic and
psychographic data. Demographic data record measurable,
census-type characteristics of people and populations, such as
age, years of education, income, and marital status.
Psychographic data also deal with measurable values of people
and populations, but they focus specifically on dimensions
related to preferences, interests, personality, and behaviors
(such as above-average levels of tennis playing, the presence of
Spanish as a second language in a household, or frequent
purchase of books). Combining demographic and psychographic data
in a joint analysis provides a uniquely powerful perspective on
a population of interest.
Datasets that are used in spatial segmentation typically
have a few dozen demographic and psychographic variables, with
values for each variable being known for neighborhoods across
the country (again, with neighborhoods being based on standard,
widely-used geographies such as postal codes, census tracts, or
block groups). The size of these geographical databases alone is
quite large. For example, a typical analysis to create a
segmentation system would bring in all 242,000 block groups in
the US. Add in a typical number of variables (we’ll use 50
variables in this example) yields an analysis that is based on
12.1 million data values. Quite a hefty analytical load!
The goal of this large-scope segmentation analysis is to use
each variable as a point of comparison between and among the
neighborhoods represented in the database. In other words, the
segmentation analysis creates groupings of most similar block
groups based on a comparative analysis of every variable in our
database. By using a broad range and variety of both demographic
and psychographic variables, spatial segmentation produces a
truly robust classification of geographic neighborhoods across
the country.
Once this overall segmentation framework is complete, we are
ready to begin stage 2, use of the overall framework to provide
insight into a customer dataset.
- Generation of an individual set of segmentation
results, tailored to the interests of a specific business.
Here we make use of the analytical foundation provided by our
phase 1 work to generate an understanding of the segmentation
characteristics of a particular business’ markets. We do this by
bringing the business’ customer dataset into consideration.
Implementing the segmentation analysis in this way acknowledges
the reality that the creation of a segmentation system (phase 1)
is a difficult, data and computing infrastructure-dependent
task. A typical segmentation analysis makes use of a
commercially-available segmentation framework that allows the
analyst to focus on phase 2 tasks and avoid getting bogged down
in the segmentation details of phase 1. Beginning with a
commercially-available segmentation framework also provides a
level of quality checking that would not be possible in a "one
and done" situation where the segmentation is used a single
time.
The possibilities for analysis of customer data are many. The
remainder of this post further discusses the details of phase 1 and
Caliper Corporation's implementation of the segmentation concept in its own
proprietary Maptitude Segmentation System.
Caliper's General
Segmentation Framework: What it Provides
Caliper has created a
flexible framework that makes available to us a ready to use set of
segmentation tools. These tools answer some basic questions that
collectively answer the biggest questions for a complete and
rigorous segmentation analysis. These questions include:
- What kinds of neighborhoods does the United States have?
Caliper
has done this basic environmental scan for us, creating a roster of
32 unique types of neighborhoods (geodemographic "subsegments") that can be found
somewhere across the United States. Each segment is located in a
color theme that also serves to group the 32 subsegments into 8
larger segments (see Figure 1). Use of 8 segments somewhat reduces
the power of the analysis but provides the important benefit of
reducing the number of neighborhood groupings dealt with by 75%
(from 32, down to 8).
Figure 1: The Maptitude Segmentation System by Subsegment
- For each of the 32 neighborhoods identified,
where can that neighborhood type be found across any given city,
state, or country?
Here, Caliper has defined
the full geography of each of its segmentation system across the
entire nation. Figure 2 below represents what this looks like at
a metropolitan scale using the example of Harris County, Texas
(Houston). The map represents where each of Caliper’s 32
neighborhood types can be found in and around Houston. Each
color in this map represents a different neighborhood type, each
of which has a unique demographic and psychographic profile. As
an added benefit to focus on the contribution of this complex
and potentially-confusing map, the Harris County map highlights
two subsegments in particular.
- The "High-Earning Families" subsegment is located across
the map in census tracts with a dark blue color shade. The
graphic also includes a profile of the segment based on some
of the data Caliper used to identify the subsegment.
- The "Opulent Homesteads" segment is also located on the
map with a dark purple shade. Again, the graphic also
provides a brief profile of this distinctive subsegment.
Figure 2: All 32 Subsegments in a Map of Harris County, Texas
- If I have a particular interest in a specific subsegment, how can I track that neighborhood type in particular?
For example, suppose I am interested in the High-Earning
Families subsegment because I know that this group is a great
source of customers for my business. Implementation of this
analysis allows us to isolate that particular subsegment in a
given local market. The map below breaks out the High-Earning
Families subsegment on its own in Harris County.
Figure 3: High-Earning Families Subsegment in Harris County, Texas
This map shows us exactly where this subsegment lives, and
provides the foundation for more analysis that can indicate what set
of business locations would most optimally serve the neighborhoods
our analysis identifies. Clearly, a business serving this
neighborhood type would need to focus on establishing locations in
the northwestern and northeastern suburbs of Houston and avoid
siting facilities in the core areas of Houston.
To give another example, one more breakout map isolates the
Opulent Homesteads subsegment across Harris County (see Figure 4).
Use this analysis allows us to consider multiple subsegments as we
make further plans to develop and grow our business. This analysis
demonstrates that, although the high earning families and opulent
homesteads neighborhood types have some superficial similarity, each
has a distinctive geographic pattern that characterizes the
neighborhoods where each congregates.
Figure 4: Opulent Homesteads Subsegment in Harris County, Texas
The key thing to recognize about this analysis so far is that it
is a general framework that helps us to broadly understand the
spatial market structure of regions and metropolitan areas. But note
that to this point the entire analysis is built around generic,
public data and the insights that can be gained from what we earlier
defined as phase 1. What value can adding a business’ own customer
or order data contribute to this analysis (phase 2)? The next blog
post will cover that next step: from generic, broad analytical
framework to specific application based on a business’ own
proprietary data.