Maptitude GISDK Help

Address Standardization

You can standardize a single address, or addresses in a field in a view:

 

GISDK Function

Summary

Standardize()

Converts an address string into a normalized form for address matching, according to a file of transformation rules

StandardizeView()

Standardizes address strings in a view and writes the results to a table

 

Parsing Addresses with Regular Expressions

You can also use regular expressions to parse street addresses into components. The RegEx class lets you extract components from an input string and puts them into an option array. For example, the following script creates a RegEx object, sets a delimiter, creates fields for a number and for a set of words, and defines a rule to parse an address:

 

// Create a RegEx gisdk object.

rx = CreateObject("RegEx")

 

// Specify the word delimiter

rx.Delimiters(" ")

 

// Define a number is a sequence of 1 or more digits

rx.Field("number","[0-9]+")

// Define a word is any other character, and words as at least one word

rx.Field("words","[a-z ]+")

 

// Create a match rule for a number followed by one or more words.

// The number will be stored in the output option called STDNUMBER,

// and the words in the option called STDNAME.

rx.Rule("$number:(STDNUMBER) $words:(STDNAME)")

 

You can apply the rule to one address:

 

parsed = rx.Match("139 main street")

ShowArray(parsed)

 

The options array contains:

 

parsed.STDNUMBER = "139"

parsed.STDNAME = "MAIN STREET"

 

Note that the value of the STDNAME option is capitalized. You can also pass an array of address strings to the Match method.

 

If you want to extract only one component from an address, you can create a simple "one-rule" regular expression like this:

 

rx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)")

 

Then you can match it and get just the name:

 

result = rx.Match("144 Mason Terrace","name")

 

The result is the string "MASON TERRACE".

 

You can use the MatchView method to apply the rule to a batch of addresses:

 

// Open Customer.dbf in the Tutorial folder before using the MatchView method.

// The output is a table with three columns: ID, STDNUMBER and STDNAME.

table = rx.MatchView(GetView()+"|",{"ADDRESS"},"ID","standardized.bin")

CreateEditor(table, table+"|",,{{"Row Background", "True"}})

SetEditorOptionEx(,{{"Row Background Color", ColorRGB(59000, 59000, 59000)}})

RedrawEditor()

 

Fields and rules will be applied in the order in which they are declared. The first rule that match successfully is the one that will return the output options array or table.

 

The regular expression syntax is as follows:

 

Item

Description

{}

Group

<

Beginning of line

>

End of line

|

Alternative; note that "abc|def" represents "abc" or "def" while "ab{c|d}ef" represents "abcef" or "abdef"

*

Zero or more of previous match

+

One or more of previous match

?

Zero or one of previous match

.

Any single char

[ ]

Charset; e.g. [0-9] is all digits and [a-z] is all letters

[~]

Not charset

\n

Newline

\

Escape

$

Field starter

:

Field assignment

()

Field name; e.g., {[0-9]+}:(num) will assign any number to num

 

The RegEx object has the following methods; the examples assume the object rx has been created:

 

Delimiters(delimiters)

Description:

Sets the characters in string as delimiters.

Arguments:

string – any string

Example:

rx.Delimiters(" /n") sets a space or a newline character as delimiters

 

ReplaceChar(list_of_characters)

ReplaceWith(replacement_characters)

Description:

Replace all of the characters listed in the string list_of_characters with the characters listed in the string replacement_characters before matching the input expression.

Arguments:

list_of_characters – a string with the characters to replace

 

replacement_characters – a string with the replacement characters

Example:

rx.ReplaceChar("àáâãåäæéèêëíîïóöôõœùûúüçñ-") lists the characters to replace

 

rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ") gives the non-accented replacements for those characters

 

Field(name, pattern[, replacement]

Description:

Defines the field name based on the regular expression pattern.

Arguments:

name – a string with the field name

 

pattern – a string with the regular expression for the field

 

replacement – optional, a string with the replacement for name  

Examples:

rx.Field("number","[0-9]+") defines the field number as one or more digits

 

rx.Field("words","[a-z ]+") defines the field words as one or more sets of one or more letters

Rule(pattern)

Description:

Defines a rule based on the regular expression pattern.

Arguments:

pattern – a string with the regular expression for the rule

Returned value:

An options array with the defined options

Example:

rx.Rule("$number:(STDNUMBER) $words:(STDNAME)") defines a rule that puts the number into the STDNUMBER option and the words into the STDNAME option

 

GetRules(recompile)

Description:

Gets the delimiters, fields, and rules for the object.

Arguments:

recompile – a Boolean value, True to recompile the rules

Returned value:

An array with the delimiters, fields, and rules

Example:

rx.GetRules() returns the current delimiters, fields, and rules for the rx object

 

Match(strings[, option])

Description:

Parses strings based on the rules.

Arguments:

strings – one string or an array of strings to parse

 

option – optional, to return just a string with the result of the rule option

Returned value:

An options array with the defined options, or a string with the result of the rule

Examples:

rx.Match("139 main street") returns "139" in the STDNUMBER option and "MAIN STREET" in the STDNAME option

rx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)") and then

result = rx.Match("144 Mason Terrace","name") returns "MASON TERRACE".

 

MatchView(view_set, input_fields, id_field, output_bin_file)

Description:

Defines a rule based on the regular expression pattern.

Arguments:

view_set – a string with the view and set

 

input_fields – an array of strings with the address field name(s)

 

id_fields – a string with the ID field name

 

output_bin_file – a string with the output BIN file name

Returned value:

A string with the name of the view created from the output BIN file

Example:

rx.MatchView(GetView()+"|",{"ADDRESS"},"ID","standardized.bin") uses all the records in the current view, parses the ADDRESS field into STDNUMBER and STDNAME, uses the ID field as the ID, and saves the result into the table standardized.bin

 

Recompile()

Description:

Runs the GetRules method with recompile = True.

Returned value:

An array with the delimiters, fields, and rules

Example:

rx.Recompile() recompiles and returns the current delimiters, fields, and rules for the rx object

 

Here is a complete example:

 

rx = CreateObject("RegEx")

rx.Delimiters(". ;:?[]*=()#,%!~\"+{}")

rx.ReplaceChar("àáâãåäæéèêëíîïóöôõœùûúüçñ-")

rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ")

rx.Field("ave/","ave|avenue")

rx.Field("rd/","rd|road")

rx.Field("st/","st|street")

rx.Field("number","[0-9]+")

rx.Field("words","[a-z ]+")

rx.Field("sttype","$ave|$rd|$st")

rx.Rule("$number:(STDNUMBER) $words:(STDNAME) $sttype:(STDNAME)")

rx.Rule("$number:(STDNUMBER) $sttype:(STDNAME) $words:(STDNAME)")

rx.Rule("$sttype:(STDNAME) $words:(STDNAME)")

rx.Rule("$words:(STDNAME) $sttype:(STDNAME)")

ShowArray(rx.GetRules())

sentences = {"àáâãåäæéèêëíî avenue","123 àáâãåäæéèêëíî avenue","132 avenue of the americas","123 a avenue"}

result = rx.Match(sentences)

ShowArray({sentences,result})

 

 

©2025 Caliper Corporation www.caliper.com