Maptitude GISDK Help |
You can standardize a single address, or addresses in a field in a view:
GISDK Function |
Summary |
Converts an address string into a normalized form for address matching, according to a file of transformation rules |
|
Standardizes address strings in a view and writes the results to a table |
You can also use regular expressions to parse street addresses into components. The RegEx class lets you extract components from an input string and puts them into an option array. For example, the following script creates a RegEx object, sets a delimiter, creates fields for a number and for a set of words, and defines a rule to parse an address:
// Create a RegEx gisdk object.
rx = CreateObject("RegEx")
// Specify the word delimiter
rx.Delimiters(" ")
// Define a number is a sequence of 1 or more digits
rx.Field("number","[0-9]+")
// Define a word is any other character, and words as at least one word
rx.Field("words","[a-z ]+")
// Create a match rule for a number followed by one or more words.
// The number will be stored in the output option called STDNUMBER,
// and the words in the option called STDNAME.
rx.Rule("$number:(STDNUMBER) $words:(STDNAME)")
You can apply the rule to one address:
parsed = rx.Match("139 main street")
ShowArray(parsed)
The options array contains:
parsed.STDNUMBER = "139"
parsed.STDNAME = "MAIN STREET"
Note that the value of the STDNAME option is capitalized. You can also pass an array of address strings to the Match method.
If you want to extract only one component from an address, you can create a simple "one-rule" regular expression like this:
rx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)")
Then you can match it and get just the name:
result = rx.Match("144 Mason Terrace","name")
The result is the string "MASON TERRACE".
You can use the MatchView method to apply the rule to a batch of addresses:
// Open Customer.dbf in the Tutorial folder before using the MatchView method.
// The output is a table with three columns: ID, STDNUMBER and STDNAME.
table = rx.MatchView(GetView()+"|",{"ADDRESS"},"ID","standardized.bin")
CreateEditor(table, table+"|",,{{"Row Background", "True"}})
SetEditorOptionEx(,{{"Row Background Color", ColorRGB(59000, 59000, 59000)}})
RedrawEditor()
Fields and rules will be applied in the order in which they are declared. The first rule that match successfully is the one that will return the output options array or table.
The regular expression syntax is as follows:
Item |
Description |
{} |
Group |
< |
Beginning of line |
> |
End of line |
| |
Alternative; note that "abc|def" represents "abc" or "def" while "ab{c|d}ef" represents "abcef" or "abdef" |
* |
Zero or more of previous match |
+ |
One or more of previous match |
? |
Zero or one of previous match |
. |
Any single char |
[ ] |
Charset; e.g. [0-9] is all digits and [a-z] is all letters |
[~] |
Not charset |
\n |
Newline |
\ |
Escape |
$ |
Field starter |
: |
Field assignment |
() |
Field name; e.g., {[0-9]+}:(num) will assign any number to num |
The RegEx object has the following methods; the examples assume the object rx has been created:
Delimiters(delimiters)
Description: |
Sets the characters in string as delimiters. |
Arguments: |
string – any string |
Example: |
rx.Delimiters(" /n") sets a space or a newline character as delimiters |
ReplaceChar(list_of_characters)
ReplaceWith(replacement_characters)
Description: |
Replace all of the characters listed in the string list_of_characters with the characters listed in the string replacement_characters before matching the input expression. |
Arguments: |
list_of_characters – a string with the characters to replace |
|
replacement_characters – a string with the replacement characters |
Example: |
rx.ReplaceChar("àáâãåäæéèêëíîïóöôõœùûúüçñ-") lists the characters to replace |
|
rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ") gives the non-accented replacements for those characters |
Field(name, pattern[, replacement]
Description: |
Defines the field name based on the regular expression pattern. |
Arguments: |
name – a string with the field name |
|
pattern – a string with the regular expression for the field |
|
replacement – optional, a string with the replacement for name |
Examples: |
rx.Field("number","[0-9]+") defines the field number as one or more digits |
|
rx.Field("words","[a-z ]+") defines the field words as one or more sets of one or more letters |
Rule(pattern)
Description: |
Defines a rule based on the regular expression pattern. |
Arguments: |
pattern – a string with the regular expression for the rule |
Returned value: |
An options array with the defined options |
Example: |
rx.Rule("$number:(STDNUMBER) $words:(STDNAME)") defines a rule that puts the number into the STDNUMBER option and the words into the STDNAME option |
GetRules(recompile)
Description: |
Gets the delimiters, fields, and rules for the object. |
Arguments: |
recompile – a Boolean value, True to recompile the rules |
Returned value: |
An array with the delimiters, fields, and rules |
Example: |
rx.GetRules() returns the current delimiters, fields, and rules for the rx object |
Match(strings[, option])
Description: |
Parses strings based on the rules. |
Arguments: |
strings – one string or an array of strings to parse |
|
option – optional, to return just a string with the result of the rule option |
Returned value: |
An options array with the defined options, or a string with the result of the rule |
Examples: |
rx.Match("139 main street") returns "139" in the STDNUMBER option and "MAIN STREET" in the STDNAME option rx = CreateObject("RegEx","{[0-9]+}:(number) {[a-z ]+}:(name)") and then result = rx.Match("144 Mason Terrace","name") returns "MASON TERRACE". |
MatchView(view_set, input_fields, id_field, output_bin_file)
Description: |
Defines a rule based on the regular expression pattern. |
Arguments: |
view_set – a string with the view and set |
|
input_fields – an array of strings with the address field name(s) |
|
id_fields – a string with the ID field name |
|
output_bin_file – a string with the output BIN file name |
Returned value: |
A string with the name of the view created from the output BIN file |
Example: |
rx.MatchView(GetView()+"|",{"ADDRESS"},"ID","standardized.bin") uses all the records in the current view, parses the ADDRESS field into STDNUMBER and STDNAME, uses the ID field as the ID, and saves the result into the table standardized.bin |
Recompile()
Description: |
Runs the GetRules method with recompile = True. |
Returned value: |
An array with the delimiters, fields, and rules |
Example: |
rx.Recompile() recompiles and returns the current delimiters, fields, and rules for the rx object |
Here is a complete example:
rx = CreateObject("RegEx")
rx.Delimiters(". ;:?[]*=()#,%!~\"+{}")
rx.ReplaceChar("àáâãåäæéèêëíîïóöôõœùûúüçñ-")
rx.ReplaceWith("aaaaaaeeeeeiiiooooeuuuucn ")
rx.Field("ave/","ave|avenue")
rx.Field("rd/","rd|road")
rx.Field("st/","st|street")
rx.Field("number","[0-9]+")
rx.Field("words","[a-z ]+")
rx.Field("sttype","$ave|$rd|$st")
rx.Rule("$number:(STDNUMBER) $words:(STDNAME) $sttype:(STDNAME)")
rx.Rule("$number:(STDNUMBER) $sttype:(STDNAME) $words:(STDNAME)")
rx.Rule("$sttype:(STDNAME) $words:(STDNAME)")
rx.Rule("$words:(STDNAME) $sttype:(STDNAME)")
ShowArray(rx.GetRules())
sentences = {"àáâãåäæéèêëíî avenue","123 àáâãåäæéèêëíî avenue","132 avenue of the americas","123 a avenue"}
result = rx.Match(sentences)
ShowArray({sentences,result})
©2025 Caliper Corporation | www.caliper.com |