When entering a new business lead record, it could be that the same record
already exists in the database. In such situations the new record can be
treated as a duplicate. The duplicate check is a process that runs
automatically in the background and identifies similar records that exists
in the database. This activity is used to
Single Hit: Only one column in the rule
needs to satisfy to mark a certain record as a duplicate.
Combined Hit: All the columns in the rule needs to be satisfied to
mark a certain record as a duplicate.
Algorithms:
It is
possible to define different types of algorithms for different columns
in the rule, to search for duplicates. Following are some details about
algorithms;
Exact:
This will compare two values and return
either 0 or 1.
Exact for all characters:
Make all
the characters lower case, compare two values and return either 0 or 1.
Exact for numbers:
For example, a telephone number.
Remove all non-digit characters, compare two values and return either 0
or 1.
Distance:
Uses oracle function UTL_MATCH.EDIT_DISTANCE.
This compares two values and returns the distance between them. Distance
is measured in number of insertion/deletion/substitution. A way of
quantifying how dissimilar two strings are to one another by counting
the minimum number of operations required to transform one string into
the other.
An operation can be an insertion, deletion or
substitution.
Examples:
“Michael” vs “Michae” will result
in 1 (had to insert a “l” at the end)
“Michael” vs “Michaell”
will result in 1 (had to delete a “l” at the end)
“Michael” vs
“Nichael” will result in 1 (had to substitute a “M” to a “N” at the
beginning)
Distance for all characters:
Keep the
strings as they are (not making lower case before comparison) and
returns the distance between two values.
Distance for numbers:
Remove all non-digit characters and returns the distance between two
values.
Fuzzy:
Uses oracle function
UTL_MATCH.JARO_WINKLER_SIMILARITY. This calculates the measure of
agreement between two strings and returns a score between 0 (no match)
and 100 (perfect match).
Fuzzy for all characters:
Make all the characters lower case, compare two values and return a
number between 0 and 100.
Fuzzy for numbers:
Remove
all non-digit characters, compare two values and return a number between
0 and 100.
There are no prerequisites for this activity.
N/A