Overview
Experian Match is an innovative data matching and linkage solution, working alongside your enterprise technology and applications of choice to bring disparate identities together.
Experian Match supports the creation and maintenance of a single customer view:
Establish
-
Locate duplicates within existing systems
-
Establish linkage across data silos
-
Supplement decisioning and risk management with data enrichment
Maintain
-
Prevent creation of duplicates at source
-
Trap and cascade address changes
Real time search
-
360-degree view of your customers
-
Improve real-time personalisation
-
Service subject access requests quickly
Concepts
Experian Match is powerful and highly configurable. To make the most of the product, we recommend you familiarise yourself with the core concepts.
Matching records
As the name suggests, the concept of matching records is the core principle behind Experian Match. Matching itself is the process of identifying which records are similar enough to be considered the same entity. This process involves standardising the input, blocking together similar records and comparing them based on configurable matching rules. The result of this is a collection of clusters containing matched records.
Match levels
Each match between two records can have one of four match levels. The default matching rules define the match levels as follows:
Exact
Each individual field that makes up the record matches exactly.
Close
Records might have some fields that match exactly and some fields that are very similar.
Probable
Records might have some fields that match exactly, some fields that are very similar and some fields that differ a little more.
Possible
Records contain the majority of fields that have a number of similarities but do not match exactly.
Data Configuration
Experian Match can connect and output to multiple data sources and data types. You will need to configure these connections as part of setting up your session.
The available data types are:
-
JDBC
-
Mongo
-
Flat file e.g. CSV
-
JMS
Data source
In order for Experian Match to start work on your data, you need to configure your data sources. Experian Match will need to know how to connect to your data source (including authentication information) and which data from the source to match on.
Match output
You will need to configure your data output if you want to output the results of your Matching jobs. Output configuration requires connection information to your database along with the fields you wish to output.
Experian Match provides four result fields that can be output alongside the source data which show the results of a matching job:
Field | Field description |
---|---|
$CLUSTER_ID |
The Cluster ID generated for the record by Matching |
$MATCH_STATUS |
The score associating the record to the cluster, e.g. Exact, Close, Probable, Possible, None |
$SOURCE_ID |
The configured Source ID from which a record originated |
$RECORD_ID |
The unique Record ID of the record as used in matching |
Matching logic
Experian Match provides complete control over the matching logic and the way records are matched.
This can be configured using blocking keys and rules:
Blocking keys
Experian Match creates blocks of similar records to assist with the generation of suitable candidate record pairs for scoring. Blocks are created from records that have the same blocking key values. Blocking keys are created for each input record from combinations of the record’s elements that have been keyed. Keying is the process of encoding individual elements to the same representation so that they can be matched despite minor differences in spelling.
A default set of best practice blocking keys are provided with the software for name and address data. These can be acquired by a call to the REST API. To use these the user must modify them to match their input data and requirements then submit them via the REST API.
Rules
Before setting up your match session, a rule set must be configured. A rule set is a set of logical expressions (rules), written in our own Domain Specific Language (DSL), which control how records are compared and how match levels are decided. We have designed the rule DSL to give you complete control over how records match.
A default best practice rule set is provided with the software for name and address data. This can be adjusted for optimal matching depending on your data and requirements.
Match store
The match store is created and configured when you set up your session. When you run a match job, the relevant match store will be populated or updated depending on the session. The store contains the newly created clusters of records. Performing an output request will output the cluster IDs from the match store to your desired location.
Clusters
A cluster is a collection of records that have been identified as representing the same entity using the rules that you have provided.
Scenarios
Performing a match and output job
The steps below outline configuring a matching session, using this session to run a match job and outputting the clustered records. As part of this process a match store is established.
Each step will return an ID in the response, these will need to be used in subsequent steps to configure and perform a matching job.
The steps are:
-
Create a data source configuration to set where your records are located
-
Create an output configuration to set where you want the clustered records to be written to
-
Create a rule configuration to decide how you want to match your records
-
Create blocking key configurations.
-
Create a session configuration to connect your configuration objects together
-
Run a match job and output the results
Follow the match and output tutorial for a detailed run through, and example requests.
Perform maintenance on data in the match store
The steps below outline the required tasks maintain data in the match store. This might be to add, update or delete any of the records in a transactional manner. Maintenance operations on data in the match store enable bringing it into line with changes to the source data. The data source for the update must not be a flat file.
-
Run a Match job following the Match and Output scenario to establish a match store that can be searched
-
Add, update or delete a record from the match store. Each transactional update will re-cluster any affected match store records.
Once you’ve made your maintenance update, you can trigger actions in the matching system as the data in your data source changes.
Search the match store for a target record
The steps below outline the required tasks to search the match store to find records that could potentially match against your target record. This is particularly useful for checking whether similar records exist before entering into your database.
The steps are:
-
Run a Match job following the Match and Output scenario to establish a match store that can be updated
-
Search the built Match Store for a target record. The API will return a collection of records along with the Match status, allowing you to make a decision on what to do with your target record.
Follow the match store searching tutorial for a more detailed run through.
For any issues with running a matching job, check the troubleshooting section
Installing
Requirements
In order to deploy Experian Match you must deploy the API under an application server, install and configure the Standardisation service, and install your database drivers. The system requires access to a JDBC compliant database to store its index tables. You may use your existing infrastructure, or deploy a dedicated instance as you wish.
Software requirements
-
Windows Server 2012 or greater
-
.Net Framework 4.5
-
Java JRE 8
-
Application server capable of deploying a
war
file.-
Apache Tomcat 8.5 is recommended
-
Hardware requirements
Application server
The matching system is highly multi-threaded and will benefit from running on a machine with multiple cores. Minimum and recommended hardware specifications are as follows.
Minimum:
-
2 CPU Cores @ 2GHz+
-
8GB RAM
-
100MB HDD Space (to install Standardisation reference data for only 1 country)
Recommended:
-
8 CPU Cores @ 2GHz+
-
32GB RAM
-
600MB Disk (to install Standardisation reference data for all countries)
Installing Standardize
Experian Match uses an external service GdqStandardizeServer
to perform input standardisation.
This service must be running for Experian Match to work correctly.
Setup
Prior to installing or starting the service, data should be installed.
-
Copy the
Standardize
directory to a location of your choosing, we recommended:C:\Program Files\Experian\Standardize
.
Data installation
By default GdqStandardizeServer data should be stored in a folder Data
within the Standardize install directory.
-
Extract the contents of the
GDQStandardizeData
zip file to a data folder in the install directory, e.g.C:\Program Files\Experian\Standardize\Data
See Advanced configuration for setting an alternative directory.
Installing or running the service.
Experian.Gdq.Standardize.Standalone.exe
can be executed simply by running the executable.
However, it is preferable to install it as a Windows Service.
-
In an Administrator PowerShell prompt, navigate to the installation location.
-
Run
.\GdqStandardizeServiceManager.ps1 install
. This will register the service in the Windows Services console. -
Run
.\GdqStandardizeServiceManager.ps1 start
. This will attempt to start the service. -
If you encounter any errors, check your configuration and try again. Note the service will not start if the data or license key is wrong. All errors are written to the system event log. Alternatively, it may be helpful to run the executable directly from the command line while troubleshooting as this will highlight the error.
Advanced configuration
The following parameters within the configuration file Experian.Gdq.Standardize.Standalone.config
can be modified:
FilePath |
Path to the data which GdqStandardizeServer requires to function. |
Default: './Data' |
hostIp |
The IP Address which GdqStandardizeServer will listen for input from. |
Default: |
port |
The Port which GdqStandardizeServer will listen for input from. |
Default: |
defaultCountry |
The default country which GdqStandardize will use when processing records. |
Default: GBR |
defaultCountryInfluence |
The default level of influence which GdqStandardize will use when processing records.
We recommend leaving this unchanged in most cases and if input records cover multiple countries.
A higher value, for example 500 can be used to force Experian Match to treat all input records as from the |
Default: 50 |
Override alias to rootname mappings
It is possible to override the file containing the alias to rootname mappings using the property "standardisation.rootname.file.path".
Matching standardisation port
The Matching product is configured to use GdqStandardizeServer
on the default port 127.0.0.1:5000
.
To change this, edit the application.properties
file in the deployment directory, and add two properties:
standardisation.host
, and standardisation.port
set to the required value.
Country processing
When standardising records, GdqStandardize needs to know what country the data in the record is referring to, in order to derive more information. Proper, country-specific standardisation affects the rest of the Matching process, as it changes how potential matches are found.
The defaultCountry configuration setting influences which country the standardisation system will assume the record is from. However, this influence can be overridden on a per-record basis by specifying an ISO 3166-1 alpha-3 code in the input data. The data should be mapped to the COUNTRY data type.
Common ISO codes:
Country |
ISO 3166-1 alpha-3 |
United Kingdom |
GBR |
United States |
USA |
Austrailia |
AUS |
France |
FRA |
Deploying
Experian Match REST API must be deployed under an application server. Instructions below are given for Apache Tomcat.
Install the latest stable version of Tomcat (currently 8.5) according to the Apache installation instructions.
Experian Match REST API is deployed like any other web application by copying the supplied war file to the CATALINA_HOME\webapps
directory.
To check that your deployment was successful, navigate to http://localhost:{port}/matching-rest-api-{VersionNumber}/swagger-ui.html. The default Tomcat port is 8080.
Persistent configuration
By default, configurations created via the REST API are not saved to disk. As such, when the application server shuts down (or is redeployed) any configurations are lost, and must be recreated on start-up. To enable persistent configuration, set a path using the matching.configLocation property.
Memory tuning
Ensure that you have as much memory allocated to the Tomcat JVM as possible.
The maximum heap size should be set as high as possible while allowing sufficient memory for the operating system and any other running processes using the -Xmx
setting.
A maximum heap size of at least 8GB is recommended.
export CATALINA_OPTS="$CATALINA_OPTS -Xms1g -Xmx12g"
Database drivers
If you are planning to connect to a SQL database you will need to make sure that you have installed the relevant JDBC database drivers.
Note: If your configurations only include Flat Files, Mongo, or HSQL, the following steps are not required and can be skipped. Installation of the SQL drivers can be done at a later date by following the steps and restarting the application.
To install the SQL Server drivers:
-
Download the .exe file from https://www.microsoft.com/en-us/download/details.aspx?id=11774. The .exe file is an archive that you will need to unzip.
-
Unzip this to a location of your choice and navigate to this location.
-
Copy the sqljdbc42.jar file located in
{EXTRACT LOCATION}\sqljdbc_6.0\enu\jre8
to yourCATALINA_HOME\lib
directory
To install the Oracle drivers:
-
Download the ojdbc6.jar file from http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html
-
Copy the ojdbc6.jar file to your
CATALINA_HOME\lib
directory.
You will need to supply the relevant driver name when making an API request that includes the connectionSettings
object, you can find more information on configuring JDBC connections here:
-
SQL Server:
com.microsoft.sqlserver.jdbc.SQLServerDriver
-
Oracle:
oracle.jdbc.driver.OracleDriver
Licensing
Experian Match must be deployed and correctly licensed before Match jobs can be run.
Ensure the steps in Deploying are complete. During deployment Experian Match generates number of licensing files, see Advanced license configuration for further details.
-
Navigate to http://localhost:{port}/matching-rest-api-{VersionNumber}/swagger-ui.html#!/System/getSystemStatus. Get the
SystemStatus
to retrieve theLicenceStatus
object of the form:{ "warnings": "", "licenseStatus": { "isLicensed": false, "licenseExpiry": null, "updateKey": "1ABC4EF6G9HIJKLMPBJJ61", "licenseFolder": "C:\\ProgramData\\Experian", "dllLocation": "C:\\Users\\username\\AppData\\Local\\Temp\\Experian-Licensing-8028610556869156823", "messages": "License not found." } }
-
Your Experian support representative will request the
updateKey
and generate anupdateCode
for you to apply to Experian Match. -
Apply the
updateCode
to Experian Match via the endpoint http://localhost:{port}/matching-rest-api-{VersionNumber}/swagger-ui.html#!/System/applyUpdateCode. Experian Match will return the newLicenceStatus
containing the license state and expiry information. Match jobs can now be run.
Advanced license configuration
By default Experian Match generates and stores licensing information within C:\ProgramData\Experian
.
Licensing DLLs are copied to a temporary directory on deployment of the application.
Both of these locations can be overridden using configuration properties if required.
If configuration properties have been set, Experian Match expects the requisite files to exist at that location:
-
matching.licensing.licensefolder
-
edqlic.ini
- the softkey directory file (initially empty). -
edqlickeys.ini
- the file containing the base license key.
-
-
matching.licensing.dllpath
-
EDQLicED.dll
orEDQLicGD.dll
for 32bit and 64bit operation respectively.
-
Configuring
Matching has the following configuration properties:
Name |
Type |
Default |
Description |
matching.configLocation |
java.lang.String |
Configuration will not be persisted |
A directory where configuration settings will be persisted |
standardisation.rootname.file.path |
java.lang.String |
The internal root name file will be used |
The location of an alias to rootName file |
matching.licensing.licensefolder |
java.lang.String |
|
Path to the directory containing the license information |
matching.licensing.dllpath |
java.lang.String |
Temp folder e.g. |
Path to the directory containing licence DLLs (EDQLicGD.dll, EDQLicED.dll) |
matchstore.purge |
java.lang.String |
false - Match Stores will be persisted by default |
Whether to purge the Match Store once the job has completed |
Adding configuration using Tomcat
-
Create an xml file under
CATALINA_HOME\conf\Catalina\localhost\
. This file should have the same name as the deployed WAR file. If the WAR has been deployed under ROOT, the configuration file should be calledroot.xml
. -
Add the required configuration as an
Environment
property in the<Context>
block.
For example:
<Context>
<Environment name="matching.configLocation" value="c:\Experian\Match\config" type="java.lang.String" override="true" />
<Environment name="standardisation.rootname.file.path" value="c:\Experian\Match\rootNames.txt" type="java.lang.String" override="true" />
</Context>
A Tomcat restart is required to load the settings after this file is created or modified.
matching.configLocation
It is recommended that the save location is set to a directory outside the Tomcat installation directory, as this is often a volatile directory, and not suitable for user data.
This directory should be included as part of a standard backup policy.
standardisation.rootname.file.path
The file should contain no header and only two columns separated by a comma:
-
Alias - The first column contains the name aliases. These should be unique. If an alias appears more than once, only the last entry that appears in the file will be used during Standardisation.
-
RootName - The second column contains the root name, which the alias maps to. The same root name can appear multiple times in this column.
An example of how this file should look is as follows:
ABBY,ABIGAIL
ABDLE,ABDUL
ABDOU,ABDUL
ABDUH,ABDUL
ABY,ABIGAIL
Tomcat must be restarted after this file is created or modified.
Persistent configuration
By default, configurations created via the REST API are not saved to disk. As such, when the application server shuts down (or is redeployed) any configurations are lost, and must be recreated on start-up. To enable persistent configuration, set a path using the matching.configLocation property.
Datasources
If you are connecting to a database you will need to supply credentials for a database user.
This is in order for the system to read from the data source containing the records you wish to match, and to write the output results.
The user must have db_reader
permission on the data source, and db_writer
permission on the output table.
A different database user is also required to manage the system’s index tables.
This user must have db_owner
, db_reader
, and db_writer
permissions as it will create/drop its index tables, and must be able to read/write to/from them.
The valid settings for connecting to data sources, output sinks, or index stores are shown below.
FLATFILE
Values with a (F)
or (D)
are valid only for either Fixed Width or Delimited and cannot be mixed.
{ "Path" : "<path to file accessible by matching system>", "Header" : "<whether the first line contains headers (true|false)>", "LineEnding" : "<string that indicates a newline>", "Delimiter" : "<delimiter char (D)>", "Quote" : "<character to quote strings that contain the delimiter: should only be " or ' and defaults to " (D)>" "ColumnWidths" : "<comma separated list of column widths (F)>", "PaddingChar" : "<character with which to pad columns in fixed-width files (F)>", }
Search and transactional updates are not supported with flat file data sources.
Note that flat file input size is limited by the amount of memory allocated to the JVM. This is because all input fields from a flat file source are loaded into memory at the start of the match job. For large input files it is recommended that the JVM maximum heap size be set to at least 8GB.
JDBC
{ "JdbcUrl" : "<a valid JDBC URL>", "JdbcDriver": "<the JDBC driver to use>", "Table" : "<the name of the table to use>", "UserName" : "<the optional username to connect with>", "Password" : "<the optional password for the username>", "HashBlockingKeys": "true|false" }
HashBlockingKeys
is a Match Store setting only. See Hashing under Tuning in the documentation for more information on this.
HSQL Data Source
The Experian Match .war file contains support for HSQL database match stores. HSQL is a simple, disk backed, in-memory relational database management system.
Note that due to the limitations of memory stores, an HSQL database is not recommended for production deployments.
Experian Match will generate a HSQL database on a local file path when configured as a JDBC connection as follows:
{ "JdbcUrl" : "jdbc:hsqldb:file:<escaped path to generated database>", "JdbcDriver": "org.hsqldb.jdbc.JDBCDriver", "UserName" : "sa", "Password" : "" "Table" : "<the name of the table to use>", }
A relative database path such as match\\hsql.db
will store the database relative to your web server.
Alternatively an absolute path such as c:\\temp\\experianmatch\\hsql.db
can be used.
On shutdown of Experian Match, the database will be persisted to disk at the location defined.
-
For JDBC connections, 3 tables will be created:
-
<table_name> as specified in the connectionSettings (e.g. Cust_Index in the above example), storing record information.
-
<table_name>_KEYS (e.g. Cust_Index_KEYS in the above example), storing blocking keys for each record.
-
<table_name>_CLSTRS (e.g. Cust_Index_CLSTRS in the above example), storing cluster information.
-
<table_name>_SCORE_PAIRS (e.g. Cust_Index_SCORE_PAIRS in the above example), storing results of record comparison and rule evaluation.
-
JDBC Data Source Tuning
Experian Match utilises HikariCP connection pooling
which can be configured external to Experian Match through the addition of a properties file named hikari.properties
in the web applications resource directory.
Experian Match is configured to use the default settings and additional tuning may be necessary based on the JDBC data source used.
See Hikari configuration settings
for full details and suggestions for configuring connectionTimeout
, maximumPoolSize
, and dataSource.*
.
MONGO
{ "MongoUrl" : "<the URL of the MongoDB instance>", "MongoPort" : "<the port to use when connecting to the MongoDB instance>", "Database" : "<the database to use>", "Collection" : "<the collection to use>", "UserName" : "<the optional username to connect with>", "Password" : "<the optional password for the username>" }
Note that the maximum size of the Mongo source collection is limited by the amount of memory allocated to the JVM, since all source fields are loaded into memory at the start of the match job. For large source collections it is recommended that the JVM maximum heap size be set to at least 8GB.
JMS
{ "JmsQueueName" : "<the JMS queue name>", "BrokerURL" : "<the URL of the broker to use>", "UserName" : "<the optional username to connect with>", "Password" : "<the optional password for the username>", "EndOfQueueString" : "<the string to signify the end of queue, default value 'EOF'>", "TimeoutMS" : "<the timeout for reading from the JMS queue in milliseconds, default value 5000>" }
Search and transactional updates are not supported with JMS data sources.
If JMS is used as an input data source then fields from this source cannot be referenced in the output mapping. If JMS is used as the output source it will only output the results of the matching jobs. No input source fields will be included.
A unique end of queue string which does not occur in the input data source should be supplied to signify the end of a JMS queue. It is advisable to use different JMS queue names for connecting to the JMS data source and and the JMS output source.
Tuning
Rules
Concepts
In order to create your own rule set or modify the default rule set it is important to understand the concepts.
Rules take the following form: <rule reference>=<expression>
A rule reference consists of a rule name followed by a ‘.’ followed by a match level.
An expression may take multiple forms. It is either a low-level expression operating on the elements within a record, or a higher-level expression composed of references to other rules.
A rule set is made up of a combination of three rule types, the three rule types increase in specificity from Match to Theme to Element. Match and Theme rules are comprised of references to rules from the level below. Element rules are comprised of rules set on specific data elements, for example a postcode or building number.
We can visualise this using a tree diagram:

Syntax
-
All rules have a rule reference on the left hand side (LHS)
-
Rule reference = <rule name>.<match level>
-
Rule name = ‘Match’ (match rule), custom identifier (theme/multi-element rule), or element (single element rule). Custom identifiers must begin with an alphabetical character and can consist of numerical characters, underscores and hyphens.
-
Match level = Exact, Close, Probable, Possible
-
-
-
The right hand side (RHS) is always surrounded by {}
-
The RHS may include logical operators & (and) and | (or).
-
Expressions may be nested and logical operators combined (parentheses are required) e.g.
MyRule.Probable = {((RuleA.Possible & RuleB.Probable) | (RuleA.Probable & RuleB.Possible)) | RuleC.Exact}
-
-
Element rules include the element and the allowed result set (enclosed in [], comma separated) and may also include an optional element modifier and/or comparator.
-
Any theme or element rule may also optionally include a field group from the input field mappings, which is identified using a hash symbol before the field group name e.g.
#MyFieldGroup.PostcodeTheme.Exact = {Postcode[ExactMatch]}
. See the Field Groups section below for further information.
Match levels
There are four match levels that can be used within each rule specification. These are Exact, Close, Probable and Possible. When working with match levels, note that:
-
Match levels will be evaluated in order from Exact through to Possible, stopping at the first level that passes.
-
Records will be considered a match, and be clustered together, if any of the defined top level overall rule match levels evaluate to true.
-
Every rule must include a match level as part of the rule reference (LHS).
Rule evaluation
Rules are evaluated as follows:
-
Match rules first - the Match.<MATCHLEVEL>
-
Left to right
-
Higher order rule matches mean that lower match levels are not required to be evaluated.
-
Evaluated lazily
Using rules with Experian Match
Rule sets are supplied as a JSON escaped string when making a rules configuration request. JSON has a number of reserved characters, such as line breaks that need to be escaped when supplying a string as part of a request. This involves replacing these characters with the relevant escape characters e.g. \n for a line break.
Long JSON escaped strings, such as a matching rule set, are not very human readable. We recommend that you make changes to your rules before escaping the string. There are a number of free online tools that will escape JSON for you.
The default rule sets for each supported country can be found here:
Rule types
There are three rule types that can be defined within the rules: Match, Theme and Element (single element, multi element, generic string and date).
Match rule
This is the highest level of rule, defining an overall match between two records. A match rule is made up of references to other rules. The rule name must always be ‘Match.<Match Level>’.
At least one match rule must be defined for a successful matching job.
Example: Match.Exact={Name.Exact & Address.Exact}
(Name.Exact and Address.Exact have been defined separately)
Rule references can be combined into compound logical expressions. In this way, you have complete control over the logic used to determine matches.
Example: Match.Close={(Name.Exact & Address.Exact) | (Name.Exact & Email.Exact & Phone.Exact)}
Theme rule
This is the next level down. Much like a match rule, a theme rule is made up of references to other rules. The rule name must begin with an alphabetical character and can consist of numerical characters, underscores and hyphens, excluding ‘Match’ and the set of reserved elements (explained further below).
Example: Address.Exact={Premise.Exact & Street.Exact & Locality.Exact & Postcode.Exact}
In a theme rule, the rule references within the expression (i.e. on the RHS) could either be further theme rules or low-level element rules.
Element rule
Element rules are the most granular of rules. They can be used to specify how to compare individual elements within a record. Elements are basic units of data that comprise an overall theme. For example a postcode or premise would be examples of address elements.
Single element rule
This is the simplest type of element rule, operating on a single element.
Example: Title.Exact={[ExactMatch,NonePopulated]}
The element to compare (Title) is specified on the LHS. On the RHS, we have a set of allowed results. For this rule to evaluate to true – i.e. for us to consider a Title match to be Exact – the Title elements in two records must either be an exact match (i.e. the strings must match exactly) or both be blank (NonePopulated).
We can also specify element modifiers and custom comparators.
Example: Title.Close = {StandardAbbreviation.Levenshtein[90%,NonePopulated]}
Here, the element modifier is StandardAbbreviation – meaning the standardised form of the title (e.g. Mister → Mr.) and the comparator is Levenshtein (an approximate string matching algorithm). This rule evaluates to true if the two StandardAbbreviation forms of the Title element are >=90% similar or if they’re both blank.
Where no element modifier is supplied, the unmodified element will be used.
Where no comparator is supplied, the default will be used – this does exact string matching, and may return ExactMatch, OnePopulated, NonePopulated, NoMatch.
If you wish to use multiple element modifiers and/or comparators in combination, it is possible to build up more complex rules.
Example: Forenames.Close={JaroWinkler[90%] | RootName[ExactMatch]}
This means, “for forenames to match closely, either the unmodified forenames element must match >=90% using the Jaro-Winkler approximate string matching algorithm, or the root name derived from the forename must match exactly” (this latter part allows Bill and William to match for example).
Multi-Element Rule
A multi-element rule allows you to compare multiple elements within a single rule.
Example: SubBuilding.Exact={SubBuilding_Number[ExactMatch,NonePopulated] & SubBuilding_Description[ExactMatch,NonePopulated] & SubBuilding_Type[ExactMatch,NonePopulated]}
This compares 3 elements – SubBuilding_Number, SubBuilding_Description, SubBuilding_Type, using no element modifier and the default comparator in all cases. Here, the element type must be specified on the RHS, and the rule name on the left is a custom identifier.
Note that it would be possible to achieve the same result using 3 single element rules and a theme rule to combine them, however the above is preferred for brevity/readability.
Field Groups
Any element can be used multiple times in input to represent separate/unique sets of information. For example, the input may include:
-
multiple street numbers, street names and postcodes e.g. a delivery address and a billing address
-
multiple forenames and surnames e.g. a primary account holder and a spouse
-
multiple generic strings e.g. an account reference and a customer reference
-
multiple dates e.g. a date of birth and a registration date
-
multiple phone numbers e.g. a mobile number and a home number
-
multiple email addresses e.g. a personal address and a work address
It may be desirable to handle these separately during rule evaluation e.g. have a more strict rule for a billing address but a more flexible rule for the delivery address.
To support this, any element can be mapped as part of a separate field group, and the field group may be specified as part of the rule expression. This is achieved by prefixing the rule with a hash symbol, followed by the field group name, followed by a full stop, before then writing the rest of the rule e.g. #FieldGroupName.Name.Exact=
.
There are a number of places where the field group can be used:
-
on the LHS within an element rule e.g.
#Parent.Surname.Exact={[ExactMatch]}
- in this example, the exact rule on the surname element only applies to the surname in the 'Parent' field group -
on the LHS within a theme/match rule e.g.
#Delivery.Address.Exact={Address.Exact}
- in this example, the exact address theme rule only applies to address elements contained in the 'Delivery' address field group -
on the RHS within an element rule e.g.
Title.Exact={#Parent.Title[ExactMatch,NonePopulated]}
- in this example, the exact rule on the title element only applies to the title in the 'Parent' field group -
on the RHS within a theme/match rule e.g.
Match.Exact={#Billing.Address.Exact & #Delivery.Address.Exact & Name.Exact}
- in this example, the exact match rule requires the name to be exact as well as the 'Billing' and 'Delivery' address field groups to be exact
Note that the field group identifier cannot be used on both sides of the rule expression. For example, #AccountID.Close={#AccountID.GenericString[ExactMatch]}
is not a valid expression.
A field group name must begin with an alphabetical character and can consist of numerical characters, underscores and hyphens. A field group name cannot include a hash symbol itself e.g. #Account#ID.Exact
will not be accepted.
Match.Exact={#Delivery.Address.Exact & #Billing.Address.Exact & GenericStringCombined.Exact}
#Billing.Address.Exact={StreetNumber.Exact & PostcodeTheme.Exact}
#Delivery.Address.Exact={Address.Exact}
Address.Exact={PostcodeTheme.Exact}
StreetNumber.Exact={MinorStreet_Number.PremiseCompare[ExactMatch]}
PostcodeTheme.Exact={Postcode.PostcodeCompare[Part1Match] & Postcode.PostcodeCompare[Part2Match]}
GenericStringCombined.Exact={#UserId.Generic_String[ExactMatch] & #MembershipID.Generic_String.JaroWinkler[95%]}
The example above shows how the field groups can be used at all levels of the rule hierarchy. The #Billing
address group requires the street and postcode within that group to be exact matches, while the #Delivery
address group only requires the postcode within that group to be exact. Additionally, the generic string element rule evaluates the two different generic strings from the different field groups in two different ways.
Elements
These are the available elements and the comparators that can be used with them.
Element Name | Element Description | Example | Available Comparators | Available ElementModifiers |
---|---|---|---|---|
Title |
Title |
Mrs |
ExactString |
Default, StandardSpelling, StandardAbbreviation |
Forenames |
Given name/names and any initials |
John |
ExactString, ForenameCompare, Levenshtein, JaroWinkler |
Default, RootName |
Surname_Prefix |
Surname prefix |
De la |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
Surname |
Surname with prefix |
Smith |
ExactString, Levenshtein, JaroWinkler |
Default |
Surname_Suffix |
Surname suffixes |
Junior |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
Gender |
Gender |
Female |
ExactString, Levenshtein, JaroWinkler |
Default |
Honorifics |
Honorifics |
Ph.D |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
Building_Description |
Building name and type |
George West House |
ExactString, Levenshtein, JaroWinkler |
Default |
Building_Number |
Building number |
43 |
ExactString, PremiseCompare |
Default |
SubBuilding_Number |
Sub-building number |
2 |
ExactString, PremiseCompare |
Default |
SubBuilding_Description |
Sub-building name |
First-floor |
ExactString, Levenshtein, JaroWinkler |
Default |
SubBuilding_Type |
Sub-building type |
Flat |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
MinorStreet_Number |
Street number |
34th |
ExactString, PremiseCompare |
Default |
MinorStreet_Predirectional |
Street pre-directional |
South |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
MinorStreet_Description |
Street name |
Johnston |
ExactString, Levenshtein, JaroWinkler |
Default |
MinorStreet_Type |
Street descriptor |
Street |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
MinorStreet_Postdirectional |
Street post-directional |
South |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
PoBox_Number |
PO Box number |
79 |
ExactString, Levenshtein, JaroWinkler |
Default |
PoBox_Description |
PO Box description |
PO Box |
ExactString, Levenshtein, JaroWinkler |
Default, StandardSpelling, StandardAbbreviation |
DoubleDependentLocality |
A small locality such as a village, used to identify an address where a street appears more than once in a dependent locality |
Kingston Gorse |
ExactString, Levenshtein, JaroWinkler |
Default/Derived, StandardSpelling |
DependentLocality |
Smaller locality used to identify an address where a street appears more than once in a locality |
East Preston |
ExactString, Levenshtein, JaroWinkler |
Default/Derived, StandardSpelling |
Locality |
A larger locality such as a town or city |
Cambridge |
ExactString, Levenshtein, JaroWinkler |
Default/Derived, StandardSpelling |
Province |
A larger area of a country, contains multiple localities |
Cambridgeshire |
ExactString, Levenshtein, JaroWinkler |
Default/Derived, StandardSpelling, StandardAbbreviation |
Country |
Country name |
United Kingdom |
ExactString, Levenshtein, JaroWinkler |
Default |
Postcode |
Postal code |
SW4 0QL |
ExactString, PostcodeCompare |
Default/Derived, Levenshtein, StandardSpelling |
Generic_String |
Generic string |
ab-1234cdef |
ExactString, Levenshtein, JaroWinkler |
Default |
Date |
ISO Date in the format YYYY-MM-DD |
1980-06-21 |
ExactString, DateCompare |
Default |
Phone |
Phone number |
(01234) 567890 |
ExactString, Levenshtein, JaroWinkler |
Default |
Email address |
ExactString, Levenshtein, JaroWinkler |
Default |
||
Email_Local |
Local part of email address |
john.smith |
ExactString, Levenshtein, JaroWinkler |
Default |
Email_Domain |
Email domain |
domain.com |
ExactString, Levenshtein, JaroWinkler |
Default |
Comparators
The following comparators can be used. The results available for the default comparator (ExactString) will also be available for all other comparators.
Comparator | Results |
---|---|
ExactString (default comparator) |
ExactMatch: Strings match exactly e.g. "John Smith" & "John Smith" OnePopulated: The field is populated for one of the records e.g. "John Smith" & "" NonePopulated: The field is not populated for either of the records e.g "" & "" NoMatch: The strings are both populated by are not an exact match e.g. "John Smith" & "John Doe" |
ForenameCompare |
InitialVsFullName: An initial or initial matches to the full name e.g "S J" & "Sarah Jane" Plus all ExactString (default comparator) results. |
PremiseCompare |
StartMatch: Premise matches the start of a premise range e.g. "12" & "12-15" StartMatchAndEncapsulated: Premise ranges match at the start and one encapsulates the other e.g. "12-15" & "12-16" EndMatch: Premise matches the end of a premise range e.g. "15" & "12-15" EndMatchAndEncapsulated: Premise ranges match at the end and one encapsulates the other e.g. "13-16" & "12-16" Encapsulated: Premise or premise range is encapsulated by the other e.g. "12" & "11-16" Overlapped: Premise ranges overlap each other e.g. "12-15" & "14-18" NumberMatchWithTrailingAlpha: Premise numbers match and one record has a trailing alpha e.g. "12" & "12a" NumberMatchWithDifferingAlpha: Premise numbers are an exact match but trailing alpha is different e.g. "12a" & "12b" Plus all ExactString (default comparator) results. |
DateCompare |
DayMonthReversed: Matches dates where the day and month are reversed eg "2017-06-03" & "2017-03-06" MonthYearMatch: Matches dates where only the month and year match eg "2017-06-03" & "2017-06-04" DayMonthMatch: Matches dates where only the day and month match eg "2017-06-03" & "2016-06-03" DayYearMatch: Matches dates where only the day and year match eg "2017-06-03" & "2017-07-03" YearMatch: Matches dates where only the year matches eg "2017-06-03" & "2017-07-04" Plus all ExactString (default comparator) results. |
PostcodeCompare |
Part1Match: Records match to the first part of the postcode e.g. "HA2 9PP" & "HA2 5QR" Part2Match: Records match to the second part of the postcode e.g. SM1 9PP" & "HA2 9PP" Plus all ExactString (default comparator) results. |
Levenshtein |
Depending upon specified comparison type, either: _Plus all ExactString (default comparator) results. |
JaroWinkler |
<Minimum %>: The minimum Jaro-Winkler distance percentage to provide a match (integer between 0-100). e.g. setting the JaroWinkler result to 95 would return a match for "John Smith" & "Joan Smith" Plus all ExactString (default comparator) results. |
Element modifiers
The Match REST API is able to identify and correct many known terms that may appear in the input record. A selection of element modifier keywords can be used to retrieve modified versions of the input elements within rule definition.
<rule name>.<match level>={<element>.<element modifier(optional)>.<comparator(optional)>[comparator results] (&...)}
ElementModifier | Operation |
---|---|
(Default) |
The element classified from the input in a cleaned form. |
StandardSpelling |
The element converted to a standard spelling (contains Derived value when available). |
StandardAbbreviation |
The element converted to the standard abbreviation. |
Derived |
A derived value that was inferred from other information in the input address. |
Example:
MinorStreet_Type.Exact={MinorStreet_Type.StandardSpelling[ExactMatch]}
Examples
Initial vs full name
Forenames | Surname | |
---|---|---|
Record 1 |
Robert |
Brooke |
Record 2 |
R |
Brooke |
Name.Probable = {Forenames.ForenameCompare[InitialVsFullName] & Surname[ExactMatch]}
Minor street number
MinorStreet_Number | MinorStreet_Description | MinorStreet_Type | |
---|---|---|---|
Record 1 |
123 |
Burnthouse |
Lane |
Record 2 |
123a |
Burnthouse |
Lane |
StreetAddress.Close = {MinorStreet_Number.PremiseCompare[NumberMatchWithTrailingAlpha] & MinorStreet_Description[ExactMatch] & MinorStreet_Type.StandardAbbreviation[ExactMatch]}
Postcode
MinorStreet_Description | MinorStreet_Type | Locality | Postcode | |
---|---|---|---|---|
Record 1 |
Hints |
Road |
Tamworth |
B78 3AB |
Record 2 |
Hints |
Road |
Tamworth |
B78 3AT |
Address.Probable = {Building_Number[ExactMatch] & MinorStreet_Description[ExactMatch] & Locality[ExactMatch] & Postcode.PostcodeCompare[Part1Match]}
Default Rules
Download text file containing the JSON escaped default rules string for the United Kingdom.
Download text file containing the JSON escaped default rules string for Australia.
Blocking key configuration
To be effective, blocking keys should represent a range of contact data sub-element combinations. Experian Match can provide default blocking keys tuned for Name and Address matching. These blocking keys may need modifying to suit the input data and use case.
The default blocking keys for a country can be obtained using the GET request /v2/configuration/blockingKey/default/{countryISO3}
and specifying a country ISO3 code.
Blocking keys are currently included for the United Kingdom(GBR) and Australia(AUS).
Each blocking key is defined by a BlockingKeyConfigModel object, and the default response is an array containing a list of these key specifications.
Blocking keys can be added to Experian Match by a POST to /v2/configuration/blockingKey
containing an array of BlockingKeyConfigModel objects.
Each key is allocated an ID to use when creating the session configuration.
Blocking keys are added to a session configuration as blockingKeyId
array.
A search session should contain all, or a subset, of the blocking keys from the session which created the match store.
This is because we will only search on blocking keys that have been created during the match job.
Note: If the blocking keys are updated on the session then the match job will need to be re-run to update the blocking keys in the match store.
All of the Elements that are available in the rules can also be mapped in a BlockingKeyConfigModel. Keyed forms of these elements are combined to form blocking keys. The Blocking Key Element Algorithms table lists the available keying algorithms.
Note: Element Names, Element Modifiers and BlockingKeyElementAlgorithm Names specified in the blocking key specifications must be all upper case.
For example to use the MinorStreet_Number
in a key it must be specified as MINORSTREET_NUMBER
.
Blocking Key Element Algorithms
The keying of each element is defined by a BlockingKeyElementAlgorithm object. If no BlockingKeyElementAlgorithm is specified the SIMPLIFIED_STRING algorithm is used.
name | Description | Keyed Example | properties |
---|---|---|---|
NO_CHANGE |
No modification - retains spaces |
"ANDREW J" ⇒ "ANDREW J" |
|
SIMPLIFIED_STRING |
Remove spaces |
"ANDREW J" ⇒ "ANDREWJ" |
|
DOUBLE_METAPHONE |
Double metaphone part 1 |
"ANDREW J" ⇒ "ANTR" |
|
DOUBLE_METAPHONE_FIRST_WORD |
Double metaphone part 1 |
"ANDREW J" ⇒ "ANTR" |
|
NYSIIS |
Nysiis |
"ANDREW J" ⇒ "ANDRAJ" |
|
SOUNDEX |
Soundex |
"ANDREW J" ⇒ "A536" |
|
CONSONANT |
Only consonants |
"ANDREW J" ⇒ "NDRWJ" |
|
INITIAL |
Initial value |
"ANDREW J" ⇒ "A" |
|
START_SUBSTRING |
Substring from beginning |
"ANDREW J"("length":3) ⇒ "AND" |
|
MIDDLE_SUBSTRING |
Substring from start to end |
"ANDREW J"("start":2,end:5) ⇒ "NDRE" |
|
END_SUBSTRING |
Substring from end |
"ANDREW J"("length":3) ⇒ "W J" |
|
The CONSONANT and SOUNDEX key types support the following character sets: Basic Latin (ASCII), Latin-1 Supplement, Latin Extended-A, Latin Extended Additional.
All other key types have been tested with the following Latin character sets: Basic Latin (ASCII), Latin-1 Supplement, Latin Extended-A, Latin Extended-B, Latin Extended-C, Latin Extended-D, Latin Extended-E, Latin Extended Additional, IPA Extensions, Phonetic Extensions, Phonetic Extensions Supplement.
Hashing
Blocking Keys are hashed when written to the match store. This obfuscates them for security; in cases where the match store is encrypted, the blocking key value must remain unencrypted. Hashing also improves performance of blocking; because hashing produces values of the same length, the database is able to create a more efficient index over the blocking keys.
Note: When using a relational database as a match store, hashing blocking keys will affect the type of the blocking key value column.
With hashing enabled the column will be CHAR(40)
, when hashing is disabled it will be VARCHAR(255)
.
When running a job with hashing enabled, it is not possible to run a job with hashing disabled against the same match store. The match store tables will have to be dropped before running a job with hashing disabled.
Hashing has no impact on Search or Transactional operations.
Output
To improve system performance, it is recommended that only the match result fields (eg. $SOURCE_ID, $RECORD_ID, $CLUSTER_ID and $MATCH_STATUS) are configured in the output mapping.
Datasource field mapping
Field mapping refers to the database fields you would like to use for matching. The available field types that you can match on are listed below:
Field | Field description |
---|---|
|
ID of the input record. This must be unique per source. |
|
Any name element. For best results ensure |
|
The title element of a full name. Cannot be used in combination with any |
|
The given name/names a full name. Cannot be used in combination with any |
|
The given surname/names a full name. Cannot be used in combination with any |
|
Any Address element. For best results ensure input |
|
Fields that contain premise and street elements such as premise, building or po box number and a street or building name. |
|
Localities such as a town or city. |
|
Larger area of a country such as a county which contains multiple localities. |
|
Postal code. |
|
Country name, either as a full name or ISO code. To enable country processing an ISO 3166-1 alpha-3 country code must be specified. |
|
A generic String element. |
|
A simple date element. It is recommended that the data is cleansed before running a Match job. It must be in the ISO date format |
|
Phone number. |
|
Email address. |
Troubleshooting
-
If your matching and output job is successful but your output data is empty or incomplete, this may mean your input fields don’t match with your data.
-
When using a FLATFILE for input, check that your input field numbers match correctly to the relevant column (column numbers start at 1).
-
If writing to a JDBC database, ensure the output table has been created with the correct columns.
-
-
If your matching job fails with the message
[Standardisation Client] Failed to connect to server
, this means matching cannot connect to GdqStandardize. Ensure the service is setup and running correctly, by following the GDQStandardize Setup
Logging
When run under Tomcat the matching.log
file can be found in CATALINA_HOME\logs\matching.log
.
Logging is handled by the log4j framework. The logging behaviour can be changed by updating the deployed log4j2.xml
, as described in the following sections.
Log levels
The log level is specified for each major component of Matching within its own section of the log4j2 configuration under the XML section '<Loggers>'. For example:
<Logger name="com.edq.matching.components.scoring" level="WARNING" additivity="false"> <AppenderRef ref="MatchingScoringLog"/> </Logger>
This specifies that the Scoring component would have a log level of 'WARNING' - this is the recommended default for all components. Each component can have the logging level increased or decreased to change the granularity in the log file.
The components of Experian Match that may be individually configured are:
Component | Description |
---|---|
com.edq.matching |
The overall application; the level set here is the default to be applied if none of the below are configured. |
com.edq.matching.components |
The main application components. This would give information regarding the processing pipeline of keying, blocking, scoring and clustering. |
com.edq.matching.dataconnector |
The components that connect to data sources and sinks (databases, files, etc.). This would give information regarding the status of connections to the data endpoints such as whether a particular file exists and is accessible. |
com.edq.matching.api |
The components that interact with the user including the REST endpoints. This would track the interactions between the user and the application. |
com.edq.standardisation |
The api that interfaces to the standalone standardisation component. |
The log levels in log4j follow the hierarchy in the table below. Therefore if you set the log level to DEBUG, you would get all the levels below DEBUG as well.
Level | Description |
---|---|
ALL |
All levels |
TRACE |
Designates finer-grained informational events than the DEBUG |
DEBUG |
Granular information, use this level to debug a package |
INFO |
Informational messages that highlight the progress of the application at coarse-grained level |
WARN |
Potentially harmful situations |
ERROR |
Error events that might still allow the application to continue running |
FATAL |
Severe error events that will presumably lead the application to abort |
OFF |
The highest possible rank and is intended to turn off logging |
Logging outputs
Experian Match log
By default Experian Match is set to output the logs to CATALINA_HOME\logs\matching.log
.
This can be changed by editing the below section of the log4j2.xml file:
<RollingFile name="MatchingLog" fileName="${LOG_DIR}/matching.log" filePattern="${ARCHIVE}/matching.log.%d{yyyy-MM-dd}.gz"> <PatternLayout pattern="${PATTERN}"/> <Policies> <TimeBasedTriggeringPolicy/> <SizeBasedTriggeringPolicy size="1 MB"/> </Policies> <DefaultRolloverStrategy max="2000"/> </RollingFile>
Adjusting the fileName
attribute allows you to change the name and location.
Other logs
In addition to the Experian Match log, there are the following logs generated in the same file system location:
MatchingMetricsLog |
Captures metrics reporting within the application (described below), logged to 'matching-metrics.log' |
MatchingScoringLog |
Captures output from the Scoring processing comprising the audited candidate pairs, logged to 'matching-scoring.log' |
These logs are configured in the same way as the Experian Match log.
Note that the MatchingMetricsLog will only operate if the Log Level for that Logger is set to 'DEBUG' and the Monitoring property enabled as described below.
Monitoring
Application monitoring is available providing metrics output from the application. Output is optionally to log file, csv file or Web based outputs. The metrics is an advanced configuration that may be used for evaluating performance, supporting fine tuning and for debugging of the application.
Monitoring configuration
This facility is configured in the system 'application.properties' file via a collection of parameters. These are:
Parameter | Default value | Description |
---|---|---|
matching.metrics.time.rate.seconds |
5 |
Metrics measurement interval in seconds |
matching.metrics.csv.active |
false |
Metrics in csv file output is only active if this is true |
matching.metrics.csv.path |
logs/metrics |
Location for csv file format metrics files |
matching.metrics.log.active |
false |
Metrics are written to log file (MatchingMetricsLog) if this is true and the Log Level is DEBUG |
matching.metrics.web.active |
false |
When true then metrics are available as a REST endpoint |
matching.metrics.endpoint |
/monitoring/* |
The REST endpoint to access Web-based metrics (will resolve to: http://localhost:{port}/monitoring?pretty=true) |
Monitoring output
Typical output comprises:
-
Summary of tasks performed
-
Summary of system usage (memory and cpu)
-
Job status
-
Number of threads used per component
-
Individual metrics for each component of Matching, comprising the timings and rates of processing
For the system log file and Web page these are all reported together, for the csv based metrics each is written to its own individual file.
Session reporting
Blocking key reporting
A reporting operation is available that provides statistics about how blocking keys were used for a matching job. An example response is given below.
The response contains two sets of blocking key statistics for each key configured, and details of the block size threshold.
blockSizeThreshold
is the maximum value for which records, with the same blocking key, will be considered as candidate matching pairs.
Experian Match records a Threshold Exception when the number of records in a block exceeds this value.
The default value of 200
can be overidden with the maxBlockSize
configuration setting when creating a blocking key.
numThresholdExceptions
is the number of exceptions recorded for each blocking key type.
Each set of statistics has the number of blocks generated and the minimum, maximum, mean, median, standard deviation of block size.
statisticsExcludingExceptions
are calculated with only the blocks that were included for matching as they were within the threshold.
statisticsIncludingExceptions
are calculated for all the blocks that could be generated including block sizes above threshold.
/v2/session/{sessionId}/blockingKeyStatistics
{
"blockSizeThreshold": 200,
"blockingKeyStatistics": [
{
"blockingKeyId": 1,
"description": "ForenamesSurname",
"numThresholdExceptions": 0,
"statisticsExcludingExceptions": {
"blockCount": 50,
"maximum": 10,
"mean": 1.3,
"median": 1.2,
"minimum": 1,
"standardDeviation": 0.75
},
"statisticsIncludingExceptions": {
"blockCount": 50,
"maximum": 10,
"mean": 1.3,
"median": 1.2,
"minimum": 1,
"standardDeviation": 0.75
}
}
]
}
Matching job management
Match jobs allow you to perform workflow steps to build clusters for all records in the configured datasources, to output the clustered records, or to perform both steps with a single call.
Note: The system currently only supports running one job at a time.
-
Any other created jobs are added to a queue with a status of PENDING, and are run sequentially in order of job creation.
-
Just like the running job, pending jobs can be cancelled if no longer required.
-
This does NOT apply to maintenance operations; these will still be immediately rejected if another job is already running, and will not be queued.
Tutorials
Performing a match and output job
The following tutorial will work through a full match and output job.
Prerequisites
-
Experian Match REST API correctly installed and deployed. This includes the installation of the standardization service. Follow the installation guide if you have not completed this. If you encounter any errors whilst working through this tutorial, check that you have correctly installed and deployed the API.
-
Postman collection and environments imported (if using Postman)
Overview
For this tutorial we will be running the API under Tomcat using the default port - in this case port 8080.
For the input we will use the example CSV file provided, connect to a disk-backed HSQL database as the match store and output the results to a new CSV file.
Our input file contains mocked up name, address and email data from the United Kingdom. By performing a match and output job we will be able to see where our data matches, ultimately providing an output file demonstrating where similar records have been clustered together. This would allow us to assess, remove or combine our duplicate records.
You can see a 6 row sample below:
RECORDID | NAME | ADDRESS1 | ADDRESS2 | ADDRESS3 | TOWN | PROVINCE | POSTCODE | DOB | |
---|---|---|---|---|---|---|---|---|---|
123514 |
Mrs Lydia Bright |
Old Brewery |
2 The Maltings |
Milton Abbas |
BLANDFORD |
DT11 0BH |
1985-04-17 |
||
123515 |
Ms Lydia Bright |
Old Brewery |
2 The Maltings |
Milton Abbas |
BLANDFORD |
DT11 0BH |
|||
123516 |
Mrs Lydia Bright |
Old Brewery |
2 The Maltings |
BLANDFORD |
1985-04-17 |
||||
123517 |
Mrs L Bright |
Old Brewery |
2 The Maltings |
Milton Abbas |
BLANDFORD |
DT110BH |
1985-04-05 |
||
123518 |
Mr James Ashworth |
Manor Farm |
WIGTON |
CA7 9RS |
1977-03-13 |
||||
123519 |
Mr James Ashworth |
Manor Farm |
WIGTON |
CUMBRIA |
CA7 9RS |
1977-03-13 |
As you can see there’s a lot of data that looks very similar. Once our job is complete, we should expect to see many of these clustered together. Whether or not these match will be down to how strict our ruleset is.
Making requests
We can make our requests using a few different methods, you can find more information in the making requests with the API section.
We have provided a Postman collection (FF-HSQL-FF) for this tutorial, you should be able to make each request in the collection sequentially to perform a match and output job. Please ensure that you have imported the Postman collections and environments correctly. The parameters provided when importing the environments will work, however the sample requests below demonstrate the explicit values.
The simplest way to explore the API and make your requests is with the Swagger interactive documentation hosted at http://localhost:8080/matching-rest-api-2.7.1/swagger-ui.html. You should be able to copy and paste the objects from the tutorial into the relevant requests, this will provide you with a better understanding of each request and the workflow.
1. Configuring your data source
The first part of configuring your job is to configure the data source. Our data source will be the provided CSV file - DEMO_100.csv
.
For simplicity we will store our input and subsequent output files in C:\temp
To make a data source configuration request we need to make a POST request to /v2/configuration/datasource. As we are running Experian Match on Tomcat, with the webapp deployed at the ROOT, the full url for this request is http://localhost:8080/v2/configuration/datasource. When using the Postman collection we want to use the "Post connection settings for INPUT FLATFILE" request.
We need to post an object where we specify how we connect to our input, in this case our CSV file. We also need to map the input fields so that the API knows what type of data we have in our input file.
Here is what our request body will look like:
{ "connection": { "connectionSettings": { "Path": "C:\\temp\\DEMO_100.csv", "Header": "true", "Quote": "\"" }, "connectionType": "FLATFILE" }, "description": "A FLATFILE input connection", "fieldMappings": [ { "field": 1, "fieldType": "ID" }, { "field": 2, "fieldType": "NAME" }, { "field": 3, "fieldType": "ADDRESS" }, { "field": 4, "fieldType": "ADDRESS" }, { "field": 5, "fieldType": "ADDRESS" }, { "field": 6, "fieldType": "LOCALITY" }, { "field": 7, "fieldType": "PROVINCE" }, { "field": 8, "fieldType": "POSTCODE" } ] }
As you can see we have set our connectionType as FLATFILE and we have set the path to that of our CSV file.
We have also set our field mappings to match that of our input file. For this example we want to match on name and address and so we have mapped the relevant fields.
The response body we get from the API is:
{ "datasourceId": 1, "description": "A FLATFILE input connection", "connection": { "connectionType": "FLATFILE", "connectionSettings": { "Path": "C:\\temp\\DEMO_100.csv", "Header": "true", "Quote": "\"" } }, "fieldMappings": [ { "field": "1", "fieldType": "ID", "fieldGroup": null }, { "field": "2", "fieldType": "NAME", "fieldGroup": null }, { "field": "3", "fieldType": "ADDRESS", "fieldGroup": null }, { "field": "4", "fieldType": "ADDRESS", "fieldGroup": null }, { "field": "5", "fieldType": "ADDRESS", "fieldGroup": null }, { "field": "6", "fieldType": "LOCALITY", "fieldGroup": null }, { "field": "7", "fieldType": "PROVINCE", "fieldGroup": null }, { "field": "8", "fieldType": "POSTCODE", "fieldGroup": null } ] }
The response confirms our connection type and field mappings as well as providing us with a data source ID. Make a note of this ID, it is important and we will need it later on.
2. Configuring your output
Next we need to configure our output. As we are outputting to a flat file we will configure an output CSV.
We need to post another object to our API, this time to http://localhost:8080/v2/configuration/output. When using the Postman collection we want to use the "Post connection settings for OUTPUT FLATFILE" request.
This object needs to include connection settings for our output file as well as mapping our input fields to our desired output fields.
Our request body will look like the below:
{ "connection": { "connectionSettings": { "Path": "C:\\temp\\DEMO_100_output.csv", "Header": "true" }, "connectionType": "FLATFILE" }, "overwriteExisting": true, "description": "A FLATFILE output connection", "outputMapping": [ { "inputField": [ { "field": "$RECORD_ID", "source": 0 } ], "outputField": "RecordID" }, { "inputField": [ { "field": 2, "source": 1 } ], "outputField": "Name" }, { "inputField": [ { "field": 3, "source": 1 } ], "outputField": "Address1" }, { "inputField": [ { "field": 4, "source": 1 } ], "outputField": "Address2" }, { "inputField": [ { "field": 5, "source": 1 } ], "outputField": "Address3" }, { "inputField": [ { "field": 6, "source": 1 } ], "outputField": "Town" }, { "inputField": [ { "field": 7, "source": 1 } ], "outputField": "County" }, { "inputField": [ { "field": 8, "source": 1 } ], "outputField": "Postcode" }, { "inputField": [ { "field": 9, "source": 1 } ], "outputField": "Email" }, { "inputField": [ { "field": 10, "source": 1 } ], "outputField": "DateOfBirth" }, { "inputField": [ { "field": "$CLUSTER_ID", "source": 0 } ], "outputField": "Cluster_ID" }, { "inputField": [ { "field": "$MATCH_STATUS", "source": 0 } ], "outputField": "Match_Status" } ] }
As you can see, some of our output fields actually differ from our data source fields. Experian Match allows us to output whichever fields we want from our input file, even if they haven’t been used for matching. For example, we are outputting the EMAIL and DOB fields, this is information we want within our output file but we don’t want the API to use it for matching on this occasion, hence omitting it from our input mapping.
We have also included a Cluster_ID, Match_Status and Record_ID output field. These fields take $CLUSTER_ID, $MATCH_STATUS and $RECORD_ID as input fields respectively. Fields that start with $ are calculated or system fields, as such, both IDs and the match status are returned by the API and not derived from the original data source. We therefore set the source as 0. For all other input fields we set the source as 1 to reflect the datasourceID returned to us when configuring our data source.
The response body we get from the API is:
{ "outputId": 1, "description": "A FLATFILE output connection", "connection": { "connectionType": "FLATFILE", "connectionSettings": { "Path": "C:\\temp\\DEMO_100_output.csv", "Header": "true" } }, "outputMapping": [ { "inputField": [ { "field": "$RECORD_ID", "source": 0 } ], "outputField": "RecordID" }, { "inputField": [ { "field": 2, "source": 1 } ], "outputField": "Name" }, { "inputField": [ { "field": 3, "source": 1 } ], "outputField": "Address1" }, { "inputField": [ { "field": 4, "source": 1 } ], "outputField": "Address2" }, { "inputField": [ { "field": 5, "source": 1 } ], "outputField": "Address3" }, { "inputField": [ { "field": 6, "source": 1 } ], "outputField": "Town" }, { "inputField": [ { "field": 7, "source": 1 } ], "outputField": "County" }, { "inputField": [ { "field": 8, "source": 1 } ], "outputField": "Postcode" }, { "inputField": [ { "field": 9, "source": 1 } ], "outputField": "Email" }, { "inputField": [ { "field": 10, "source": 1 } ], "outputField": "DateOfBirth" }, { "inputField": [ { "field": "$CLUSTER_ID", "source": 0 } ], "outputField": "Cluster_ID" }, { "inputField": [ { "field": "$MATCH_STATUS", "source": 0 } ], "outputField": "Match_Status" } ], "filter": "ALL", "overwriteExisting": true }
Once again our response confirms our output connection settings and field mappings. We also get an output ID - as with the data source ID we will use this for configuring our session.
We should expect to see these fields in our output file:
RecordID | Name | Address1 | Address2 | Address3 | Town | County | Postcode | DateOfBirth | Cluster_ID | Match_Status |
---|
3. Configuring your rules
We now need to configure the rules we will be using to control the stringency of our matching job. We do this by making a post request to http://localhost:8080/v2/configuration/rule. When using the Postman collection we want to use the "Post rules" request.
We need to provide a JSON escaped string containing our rules. We have used the default ruleset for the United Kingdom, which you can find more information about in our rules section. We can also give our rules a description, this is useful if we plan to use different rulesets for subsequent jobs.
Our request object will look like the below:
{ "description": "Default Matching Rules for the United Kingdom", "ruleVersion": "v1", "rules": "<As in file>" }
The response body we will get from the API is:
{ "ruleSetId": 1, "description": "Default Matching Rules for the United Kingdom", "rules": "<As in file>", "ruleVersion": "v1" }
Our response will return the description, rules and version we posted along with a ruleSet ID which will be used when we configure our session.
4. Configuring the blocking keys
We will use the default United Kingdom blocking keys from the GET request to http://localhost:8080/v2/configuration/blockingKey/default/GBR
[ { "description": "FullPostcode", "countryCode": "GBR", "maxBlockSize": 200, "elementSpecifications": [ { "elementType": "POSTCODE", "includeFromNChars": 5, etc...
To use these blocking keys with our session we need to make a POST request to http://localhost:8080/v2/configuration/blockingKey including the keys. Copy the response from the GET request and POST this to create the keys. The newly generated keys will each have a blockingKeyId for use in the session configuration.
5. Configuring your session
Now that we have configured our data source, output, rules and blocking keys we can configure our session.
We need to post the IDs returned to us in our previous requests as well as providing connection settings for our SQL database to be used as the match store. We make the post request to http://localhost:8080/v2/session When using the Postman collection we want to use the "Post session" request.
For the blocking keys we want to use IDs 1 to 9 which we set earlier. As we haven’t configured any other data sources, outputs or rules, our IDs should all be 1. We can also make use of the HSQL functionality provided by the API. This allows us to specify a disk-backed store.
Our request body will look like the below:
{ "datasourceIds": [ 1 ], "description": "A FF-HSQL-FF matching session", "matchStoreConnection": { "connectionSettings": { "JdbcUrl": "jdbc:hsqldb:file:c:\\temp\\hsqldb", "JdbcDriver": "org.hsqldb.jdbc.JDBCDriver", "UserName": "sa", "Password": "", "Table": "DEMO_100_FF" }, "connectionType": "JDBC" }, "outputId": 1, "ruleSetId": 1, "blockingKeyIds": [ 1, 2, 3, 4, 5, 6, 7, 8, 9 ] }
As we are making use of HSQL we have set the connection type to JDBC and our JdbcUrl to a file in our c:\\temp
folder.
We also need to provide the right driver. The HSQL driver is packaged with the API.
The response body we get from the API is:
{ "sessionId": 1, "description": "A FF-HSQL-FF matching session", "matchStoreConnection": { "connectionType": "JDBC", "connectionSettings": { "JdbcUrl": "*****", "JdbcDriver": "org.hsqldb.jdbc.JDBCDriver", "UserName": "*****", "Password": "*****", "Table": "DEMO_100_FF" } }, "blockingKeyIds": [ 1, 2, 3, 4, 5, 6, 7, 8, 9 ], "datasourceIds": [ 1 ], "outputId": 1, "ruleSetId": 1 }
Our response will return us the same JSON as we posted along with a session ID. We will use this sessionID when performing a match job.
5. Run your match and output job
Our final step is to run the match and output job. We need to post the sessionID to http://localhost:8080/v2/matching/job/matchAndOutput. When using the Postman collection we want to use the "Match and Output" request.
We can also give our a description and specify a callback URI, we will leave the callback blank for this tutorial.
Our request body will look like the below:
{ "description": "FF-SQL-FF match and output", "sessionId": 1 }
The initial response body we get from the API is:
{ "jobId": 1, "description": "FF-SQL-FF match and output", "createTime":"2017-06-07T18:55:30.35", "startTime": "2017-06-07T18:55:34.87", "finishTime": null, "progress": 0, "message": null, "state": "PENDING" }
The response shows:
-
the unique job ID
-
the job description
-
the time that the job was first created
-
the time that the job was started
-
the time that the job was finished
-
the job progress
-
a message detailing the state of the job
-
the state of the job (i.e. PENDING/RUNNING/CANCELLED/FAILED/FINISHED)
To check the status of our job we need to make a GET request to http://localhost:8080/v2/matching/job/1. The 1 refers to the jobID returned to us when we first scheduled the job. You could also make a GET request to http://localhost:8080/v2/matching/job to return the status of all jobs.
We will get the following response if our job was successful:
{ "jobId": 1, "description": "FF-SQL-FF match and output", "createTime":"2017-06-07T18:55:30.35", "startTime": "2017-06-07T18:55:34.87", "finishTime": "2017-06-07T18:55:59.381", "progress": 100, "message": "Job Complete.", "state": "FINISHED" }
We will now be able to navigate to our C:\temp folder and view our E2E_100_output.csv
output file.
If you sort your output file by Cluster_ID you will be able to see which records have been clustered together.
A 6 row sample of the output is shown below:
Name | Address1 | Address2 | Address3 | Town | County | Postcode | DateOfBirth | RecordID | Cluster_ID | Match_Status | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mrs Lydia Bright |
Old Brewery |
2 The Maltings |
Milton Abbas |
BLANDFORD |
DT11 0BH |
1985-04-17 |
123514 |
17 |
CLOSE |
||
Ms Lydia Bright |
Old Brewery |
2 The Maltings |
Milton Abbas |
BLANDFORD |
DT11 0BH |
123515 |
17 |
CLOSE |
|||
Mrs Lydia Bright |
Old Brewery |
2 The Maltings |
BLANDFORD |
1985-04-17 |
123516 |
17 |
PROBABLE |
||||
Mrs L Bright |
Old Brewery |
2 The Maltings |
Milton Abbas |
BLANDFORD |
DT110BH |
1985-04-05 |
123517 |
17 |
CLOSE |
||
Mr James Ashworth |
Manor Farm |
WIGTON |
CA7 9RS |
1977-03-13 |
123518 |
3 |
EXACT |
||||
Mr James Ashworth |
Manor Farm |
WIGTON |
CUMBRIA |
CA7 9RS |
1977-03-13 |
123519 |
3 |
EXACT |
Experian Match has clustered a number of our records together, clearly showing that, as we suspected, we had duplicate data in our data source.
Match store searching
After a match job has been run, you can search the match store to find records which could potentially match against your target record. This is so that the Match Store can be checked for existing matches, and protect duplicates from being entered into the Match Store. Searching may also be a pre-condition for performing transactional maintenance.
Basic searching
Searches can be performed against a built Match Store with very little configuration. Using the example on the Overview page, a search can be made as such:
/v2/search/1
{
"search": {
"Forename": "Joe",
"Surname": "Bloggs",
"Address1": "10 Downing St",
"Address2": "London"
}
}
The session ID is passed as a parameter in the path. The contents of search
is a schema-less JSON object. The keys correspond to the input mappings specified in the Data Source field mappings.
Here, Address3
was not specified, and so it can be omitted, and will be treated as blank when Searching.
The response object will look like this:
{
"results": [
{
"record": {
"Cust_ID": "1",
"FirstName": "Joseph",
"LastName": "Bloggs",
"Cluster": 1
},
"matchStatus": "EXACT"
},
{
"record": {
"Cust_ID": "2",
"FirstName": "Joe",
"LastName": "Bloggs",
"Cluster": 1
},
"matchStatus": "PROBABLE"
},
{
"record": {
"Cust_ID": "3",
"FirstName": "Jo",
"LastName": "Blog",
"Cluster": 2
},
"matchStatus": "PROBABLE"
} // etc...
]
}
Without any additional configuration, the response will be a list of objects containing a record and it’s corresponding match status with respect to the search term. The record contains fields from the Output Mapping which could potentially match the provided search term. If no output mapping has been configured (i.e. because you have run a MATCH job), then the output will contain fields recordID, sourceID, and clusterID.
The Search result order is in descending order of matching status. So EXACT matches are returned first in the collection.
Potential matches are determined using the rule set configured in the Session.
If you wish to override the fields which are use for searching, or the rules use for matching, and output fields this will require additional configuration.
Advanced searching
To use different field mappings from your match job, you must create a new data source, and register it with a new session.
First, create a new data source configuration. Refer to Overview Section 1: Data source configuration.
You can specify REST
as a data source type, to prevent this configuration from being used in normal match
or matchAndOutput
jobs.
/v2/configuration/datasource
{
"description": "A data source use only for Search",
"connection": {
"connectionType": "REST"
}
"fieldMappings": [
// etc...
]
}
You also don’t need to supply any connectionSettings
, as there are none.
For output mappings, follow a similar approach of using REST
as a data source type.
Note
|
If you have specified a REST data source type, you cannot use this as an "inputField" source in the output mapping. Doing so will result in an error. |
If you wish to use separate rules or separate blocking keys for matching, create these configurations in the standard way. See the tuning section for more information.
Next, you need to create a new Session to contain your overridden settings:
/v2/session
{
"description": "Search session",
// These settings should be the same as your existing Session.
"matchStoreConnection": {
"connectionSettings": {
"JdbcUrl": "jdbc:sqlserver://DBSERVER;jdbcDatabaseName=CustomerDB",
"JdbcDriver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"Table" : "Cust_Index",
"UserName" : "matchingUser",
"Password" : "Password123"
},
"connectionType": "JDBC"
},
"datasourceIds": [
1, // The existing data source ID
2 // Your new data source ID (REST)
],
"ruleSetId": 2, // Your new rule set ID
"outputId": 2, // Your new output mapping (REST)
"blockingKeyIds": [
1,
2
] // Your blocking key IDs
}
Your search can then be performed using this new Session, which will have the new field mappings for the search terms.
Find by Cluster ID or Record ID.
After a match
job has run, you may wish to find clusters from a single Record ID or a Cluster ID.
When searching using a Cluster ID, the result will be all records in the specified cluster. The path parameters are the Session ID and the Cluster ID being searched for.
/v2/search/1/23
The response object will look like this:
{
"results": [{
"Cust_ID": "1",
"FirstName": "Joseph",
"LastName": "Bloggs",
"Cluster": 1
} // etc...
]
}
When searching using a Record ID, the result will be all records in the same cluster, including the specified record ID. The path parameters are the Session ID, and the Source ID and Record ID being searched for.
/v2/search/1/1/123
The response object will look like this:
{
"results": [{
"Cust_ID": "1",
"FirstName": "Joseph",
"LastName": "Bloggs",
"Cluster": 1
} // etc...
]
}
Transactional maintenance
After a match job has been run, you can maintain data in the match store to bring it into line with changes in the source data. Transactional maintenance functionality is presented via the REST interface, allowing Records to be added to, updated in or deleted from the Match Store. The Match Store search functions may be used to identify Records requiring maintenance.
In performing a maintenance operation, the request will apply all relevant Matching pipeline tasks to the record. This includes Standardisation, Keying and Blocking where the record is being added or updated. Finally, the record is re-scored and clustered, potentially leading to changes to any other Records clustered with or previously clustered with that Record.
Caveats
-
In order to perform maintenance on the Match Store there should be no jobs already running
-
The maintenance function will lock the Match Store for the duration of the request
-
Actions against the Match Store are performed as atomic transactions, rolling back in the event of any error
-
The data source pointed to by the datasource id must not be a flat file source
Available functions
addMatchStoreRecord
This function captures the addition of a new Record to a data source.
An example request:
/v2/matching/addMatchStoreRecord
{ "sessionId": "1", "datasourceId": "1", "recordId": "123458" }
Where:
-
sessionId: the configured session
-
datasourceId: the data source containing the Record being added
-
recordId: the new unique Id
This operation involves Standardisation, Keying and Blocking of the new Record followed by re-scoring and clustering with all Records related by scoring.
updateMatchStoreRecord
This function captures the case of some change to an existing Record in a data source. Typically this would represent a change to some attribute(s) on the Record that may affect the match characteristics of the Record.
An example request:
/v2/matching/updateMatchStoreRecord
{ "sessionId": "1", "datasourceId": "1", "recordId": "123458" }
Where:
-
sessionId: the configured session
-
datasourceId: the data source containing the Record being updated
-
recordId: the existing unique Id
This operation involves Standardisation, Keying and Blocking of the updated Record followed by re-scoring and clustering with all Records related by scoring.
deleteMatchStoreRecord
This function captures the case of an existing Record being removed from a data source.
An example request:
/v2/matching/deleteMatchStoreRecord
{ "sessionId": "1", "datasourceId": "1", "recordId": "123458" }
Where:
-
sessionId: the configured session
-
datasourceId: the data source containing the Record being removed
-
recordId: the existing unique Id
This operation involves re-clustering of all Records related to the Record being deleted.
Results
A result JSON object is produced upon successful completion of the maintenance request. This contains the identifier of the affected Record in the Match Store along with the clustering outcome (see below), the Cluster Id to which the Record has been allocated and a collection containing the full set of Record ids in the same Cluster. In addition, a changes collection contains the Clusters and their member identifiers that have been affected by the operation.
Clustering outcome
When a maintenance operation is performed, the final stage is to regenerate the Clusters for all Records affected by the operation, i.e. scored together or previously clustered with the Record in question. The outcome of re-clustering may be one of several cases.
-
ADD_NEW: a new Cluster has been generated for this Record (i.e. an Add of a Record with no other matches)
-
ADD_EXISTING: this Record has been added to an existing Cluster (i.e. an Add of a Record along with its matches)
-
MERGE: the transaction has led to two Clusters merging (e.g. the Record forms a bridge between the two)
-
SPLIT: the transaction has led to a Cluster splitting (e.g. the Record previously formed a bridge between the two)
-
DELETE_RECORD: this Record has been removed from an existing Cluster but other Records remain in that Cluster
-
DELETE_CLUSTER: this Record has been removed from an existing Cluster and no other Records remain in that Cluster
-
COMPLEX: some combination of more than one of the above cases occurred in the same transaction
-
NO_CHANGE: no changes occurred to the clustering arrangements
Example
The below result shows the outcome for an update where the change to the Record has caused an existing Cluster to split into two. The changes collection shows all affected Records with their previous and new Cluster ids.
In detail, the original state has a Cluster comprising a number of records. When the new Record was presented, it caused two of the original Cluster’s Records to be split out of that Cluster into a new one, along with the new Record. In the changes collection we can see these two Records recording the original Cluster Id and the new one as well as the new Record, whose original Cluster Id is null. Records remaining in the original Cluster are not shown as they have not been changed.
{ "results": [ { "outcome": "SPLIT", "recordId": { "recId": "123470", "source": 1 }, "clusterId": 4, "recordsInCluster": [ { "recId": "123469", "source": 1 }, { "recId": "123471", "source": 1 }, { "recId": "123470", "source": 1 } ], "changes": [ { "recordId": { "recId": "123471", "source": 1 }, "prevClusterId": 1, "clusterId": 4 }, { "recordId": { "recId": "123470", "source": 1 }, "prevClusterId": null, "clusterId": 4 }, { "recordId": { "recId": "123469", "source": 1 }, "prevClusterId": 1, "clusterId": 4 } ], "oldClusters": [ { "clusterId": 1, "recordsInCluster": [ { "recId": "123469", "source": 1 }, { "recId": "123471", "source": 1 } ] } ], "newClusters": [ { "clusterId": 4, "recordsInCluster": [ { "recId": "123469", "source": 1 }, { "recId": "123470", "source": 1 }, { "recId": "123471", "source": 1 } ] } ] } ] }
Errors
A transactional request will be rejected in any of the following circumstances:
-
The specified session is not configured
-
The specified data source is not configured or cannot be accessed
-
The Match Store is unavailable
-
Another job is already in progress
-
The request is to Add but the Record Id already exists for that source in the Match Store
-
The request is to Add or Update but the Record Id does not exist at the specified source
-
The request is to Update or Delete but the Record Id does not exist for that source in the Match Store
-
The request is to Add or Update but the Record Id exists more than once at the specified source
Note that for a Delete no check is made that the Record Id has been deleted from the data source: this is the responsibility of the client workflow.
API Reference
Base URL
As an example, a deployment on the default Tomcat port would have the base URL http://localhost:8080/matching-rest-api-2.7.1/
Making requests
If you would like to make requests to the API without integrating you can use one of the following methods:
Using the Swagger-UI interactive documentation
-
Swagger-UI documentation is hosted by the service itself and can be accessed at http://localhost:{port}/matching-rest-api-2.7.1/swagger-ui.html
-
The Swagger-UI documentation allows you to make requests against the API from within the documentation.
Using the Postman collection
-
Before using Postman to make requests, you will need to import the Matching collections. You can do this by clicking import in the top left and navigating to the postman collection JSON files provided in the package. These can be found under
Integration Samples\Postman\collections
. -
It is recommended that you use the Environments feature of Postman to parameterise values such as the hostname so you can use the same scripts against dev and production instances. You can import pre-configured environments and parameters from the package under Integration Samples\Postman\environments. To import environments, click the cog in the top right and select Manage Environments.
-
The Postman collection contains example requests for setting up input, index, and output configurations, and then running a full matching job.
-
The collection is set up to use the provided sample data which should be copied to
C:\temp\
. -
The collection contains a set of best practice rules for the United Kingdom.
Using the Swagger-CodeGen program
-
Generate a client library in multiple languages using the code generator, this can be downloaded from https://github.com/swagger-api/swagger-codegen.
-
Follow the documentation and it will generate a client library for your language of choice.
-
You can generate a C# client by starting the Matching service and running the following command:
java -jar swagger-codegen-cli-2.2.1.jar generate -i http://localhost:{port}/matching-rest-api-2.7.1/v2/api-docs -l csharp -o MatchingClient
-
It’s possible to customise the code generated by this tool. For example, here’s how you could generate a C# client for .NET v3.5:
java -jar swagger-codegen-cli-2.2.1.jar generate -i http://localhost:{port}/matching-rest-api-2.7.1/v2/api-docs -l csharp -o MatchingClient -DtargetFramework="v3.5"
-
For more information about customising the generator, see here
-
Resources
Configuration
Configuration for input data source, output sink and rule set.
Get all output configurations.
GET /v2/configuration/output
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
OutputConfigModel array |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Create a new configuration for connecting to an output sink.
POST /v2/configuration/output
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
configuration |
A description of where to store output data. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
201 |
Configuration created successfully. |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get the best practice blocking keys by Country ISO 3166-1 alpha-3.
GET /v2/configuration/blockingKey/default/{countryISO3}
Description
Returns the best practice blocking keys per country.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
countryISO3 |
Country of configuration |
true |
string |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
BlockingKeyConfigModel array |
404 |
No Blocking keys found for specified country. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get all blocking keys
GET /v2/configuration/blockingKey
Description
Returns all the blocking keys.
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
BlockingKeyConfigModel array |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Creates one or more blocking key configurations.
POST /v2/configuration/blockingKey
Description
Creates blocking key configurations given an array of specifications.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
configurations |
Blocking key configuration array. |
true |
BlockingKeyConfigModel array |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
BlockingKeyConfigModel array |
201 |
Configuration created successfully. |
BlockingKeyConfigModel array |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Delete a configuration by ID
DELETE /v2/configuration/output/{outputId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
outputId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
204 |
Successfully deleted |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Update an output connection configuration by ID.
PUT /v2/configuration/output/{outputId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
outputId |
ID of configuration |
true |
integer (int32) |
|
BodyParameter |
configuration |
A description of where and how to store output data. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get an output connection configuration by ID.
GET /v2/configuration/output/{outputId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
outputId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Deletes a blocking key configuration by ID
DELETE /v2/configuration/blockingKey/{blockingKeyId}
Description
Deletes a specified blocking key configuration.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
blockingKeyId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
204 |
Successfully deleted |
|
404 |
BlockingKey with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get a blocking key configuration by ID.
GET /v2/configuration/blockingKey/{blockingKeyId}
Description
Returns the specified blocking key configuration.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
blockingKeyId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
404 |
BlockingKey with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Delete a rule by ID
DELETE /v2/configuration/rule/{ruleId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
ruleId |
ID of rule. |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
204 |
Successfully deleted |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Update a rule by ID.
PUT /v2/configuration/rule/{ruleId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
ruleId |
ID of configuration |
true |
integer (int32) |
|
BodyParameter |
configuration |
A description of matching rules. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get a rule by ID
GET /v2/configuration/rule/{ruleId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
ruleId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get all rule configurations
GET /v2/configuration/rule
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
RuleConfigModel array |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Create a new configuration that defines matching rules.
POST /v2/configuration/rule
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
configuration |
A description of matching rules. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
201 |
Configuration created successfully. |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get the best practice blocking keys.
GET /v2/configuration/blockingKey/default
Description
Returns the best practice blocking keys.
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
BlockingKeyConfigModel array |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get all data source configurations.
GET /v2/configuration/datasource
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
DataSourceConfigModel array |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Create a new configuration to connect to a data source.
POST /v2/configuration/datasource
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
configuration |
A description of how to connect to the data source, what the data represents, and how it should be used. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
201 |
Configuration created successfully. |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Delete a configuration by ID.
DELETE /v2/configuration/datasource/{datasourceId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
datasourceId |
ID of configuration. |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
204 |
Successfully deleted. |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Update a data source configuration by ID.
PUT /v2/configuration/datasource/{datasourceId}
Description
Sets a data source configuration using the supplied object. The updated values will be returned, however sensitive connection details such as a username or password will be masked.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
datasourceId |
ID of configuration. |
true |
integer (int32) |
|
BodyParameter |
configuration |
A description of how to connect to the data source, what the data represents, and how it should be used. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get a data source configuration by ID.
GET /v2/configuration/datasource/{datasourceId}
Description
Returns a data source configuration object. Sensitive connection details such as a username or password will be masked.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
datasourceId |
ID of configuration. |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Configuration with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Matching
Run a full matching job or individual stages.
Get all matching jobs
GET /v2/matching/job
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Schedule a job which will find matching records and write them to the configured data output sink.
POST /v2/matching/job/matchAndOutput
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
request |
The job session and configuration to use. |
false |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
202 |
Success |
|
400 |
Invalid MatchingRequest |
string |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Cancel a running or pending job.
POST /v2/matching/job/{jobId}/cancel
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
jobId |
ID of job |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Job with supplied ID doesn’t exist. |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Output matching records.
POST /v2/matching/job/outputMatches
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
request |
The job session and configuration to use. |
false |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
202 |
Success |
|
400 |
Invalid MatchingRequest |
string |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Schedule a job to group records from the configured data sources into clusters.
POST /v2/matching/job/match
Description
Schedule a job to read all records from the data sources configured in the session and put the records into clusters.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
request |
The job session and configuration to use. |
false |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
202 |
Success |
|
400 |
Invalid MatchingRequest |
string |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get the status of a job
GET /v2/matching/job/{jobId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
jobId |
ID of job |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Job with supplied ID doesn’t exist. |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Search
Search records in the system.
Find the source and record IDs of records in the specified cluster.
GET /v2/search/{sessionId}/{clusterId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
sessionId |
sessionId |
true |
integer (int32) |
|
PathParameter |
clusterId |
clusterId |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Record(s) found. |
|
404 |
Cluster ID does not exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Find the cluster of records matching the search object.
GET /v2/search/{sessionId}/{sourceId}/{recordId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
sessionId |
sessionId |
true |
integer (int32) |
|
PathParameter |
recordId |
recordId |
true |
string |
|
PathParameter |
sourceId |
sourceId |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Record(s) found. |
|
404 |
Record ID does not exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Find the cluster of records matching the search object.
POST /v2/search/{sessionId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
sessionId |
sessionId |
true |
integer (int32) |
|
BodyParameter |
request |
request |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Record(s) found. |
|
404 |
No matches found. |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Session
Configure a session to use with matching operations.
Delete a configuration by ID
DELETE /v2/session/{sessionId}
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
sessionId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
204 |
Successfully deleted |
|
404 |
Session with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Update an session configuration by ID.
PUT /v2/session/{sessionId}
Description
Updates a session with the supplied settings.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
sessionId |
ID of configuration |
true |
integer (int32) |
|
BodyParameter |
configuration |
A description of the session including data sources, output settings and match rules. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
400 |
Unable to update session. |
string |
404 |
Session with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get a session configuration by ID.
GET /v2/session/{sessionId}
Description
Returns the specified session configuration.
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
sessionId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Session with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get all session configurations
GET /v2/session
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
SessionConfigModel array |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Create a new session.
POST /v2/session
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
configuration |
A description of the session including data sources, output settings and match rules. |
true |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
OK |
|
201 |
Configuration created successfully. |
|
400 |
Unable to create session. |
string |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Get information about blocking keys from a given session
GET /v2/session/{sessionId}/blockingKeyStatistics
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
PathParameter |
sessionId |
ID of configuration |
true |
integer (int32) |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
|
404 |
Session with supplied ID doesn’t exist. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
System
Retrieve status and license information.
Get the system status.
GET /v2/system/status
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Success |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Apply an update code.
POST /v2/system/applyUpdateCode
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
QueryParameter |
updateCode |
updateCode |
false |
string |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Feature enabled successfully. |
|
400 |
Unable to apply update code. See response for details. |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Transaction
Perform transactional maintenance of records in the system.
Delete a record from the match store.
POST /v2/transaction/delete
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
request |
The session and record information for the match. |
false |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Match record deleted. |
|
400 |
Invalid MatchStoreRequest |
No Content |
404 |
Record ID does not exist. |
No Content |
409 |
Another job is already running, please try again once it has completed. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Update an existing record in the match store.
POST /v2/transaction/update
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
request |
The session and record information for the match. |
false |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Match record updated |
|
400 |
Invalid MatchStoreRequest |
No Content |
404 |
Record ID does not exist. |
No Content |
409 |
Another job is already running, please try again once it has completed. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Add a new record to the match store.
POST /v2/transaction/add
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
BodyParameter |
request |
The session and record information for the match. |
false |
Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Match record added. |
|
400 |
Invalid MatchStoreRequest |
No Content |
404 |
Record ID does not exist. |
No Content |
409 |
Another job is already running, please try again once it has completed. |
No Content |
Consumes
-
application/json
Produces
-
application/json;charset=UTF-8
Definitions
BlockSizeStatistics
Name | Description | Required | Schema | Default |
---|---|---|---|---|
blockCount |
Number of blocks. |
false |
integer (int32) |
|
maximum |
Maximum block size. |
false |
integer (int32) |
|
mean |
Mean block size. |
false |
number (double) |
|
median |
Median block size. |
false |
number (double) |
|
minimum |
Minimum block size. |
false |
integer (int32) |
|
standardDeviation |
Standard deviation of block size. |
false |
number (double) |
BlockingKeyConfigModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
blockingKeyId |
Integer ID to refer to the key. |
false |
integer (int32) |
|
countryCode |
ISO3 code of country blocking key is to be used with. |
false |
string |
|
description |
Descriptive name for blocking key. |
false |
string |
|
elementSpecifications |
Array of element specifications. Blocking keys using the list of keys in order. |
true |
||
maxBlockSize |
The maximum value for which records with the same blocking key will be considered as candidate matching pairs. The default value is 200. |
false |
integer (int32) |
BlockingKeyElementAlgorithm
Name | Description | Required | Schema | Default |
---|---|---|---|---|
name |
KeyType algorithm name. |
false |
enum (NO_CHANGE, DOUBLE_METAPHONE, DOUBLE_METAPHONE_FIRST_WORD, NYSIIS, SIMPLIFIED_STRING, SOUNDEX, CONSONANT, INITIAL, START_SUBSTRING, MIDDLE_SUBSTRING, END_SUBSTRING) |
|
properties |
Optional properties for algorithm. |
false |
object |
BlockingKeyElementSpecificationModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
algorithm |
Keying algorithm used to key the element. |
false |
||
elementGroup |
The group the element belongs to in the data source configuration. |
false |
string |
|
elementModifiers |
Array of element modifiers to use if available. The list is processed in order and the first populated normalised form for an element will be used. For example given: |
false |
enum (NONE, REMOVENOISE, STANDARDSPELLING, STANDARDABBREVIATION, ROOTNAME, STANDARDFORMAT, DERIVED) array |
|
elementType |
Type of element to key and use at this point in the blocking key. |
true |
enum (ID, SOURCE_ID, CLUSTER_ID, UNCLASSIFIED, COMPANY, POSITION, DIVISION, ORGANISATION, NAME, TITLE, FORENAMES, SURNAME_PREFIX, SURNAME, SURNAME_SUFFIX, HONORIFICS, GENDER, ADDRESS, PREMISE_AND_STREET, BUILDING_NUMBER, BUILDING_DESCRIPTION, BUILDING_TYPE, SUBBUILDING_NUMBER, SUBBUILDING_DESCRIPTION, SUBBUILDING_TYPE, MINORSTREET_NUMBER, MINORSTREET_PREDIRECTIONAL, MINORSTREET_DESCRIPTION, MINORSTREET_TYPE, MINORSTREET_POSTDIRECTIONAL, MAJORSTREET_NUMBER, MAJORSTREET_PREDIRECTIONAL, MAJORSTREET_DESCRIPTION, MAJORSTREET_TYPE, MAJORSTREET_POSTDIRECTIONAL, POBOX_NUMBER, POBOX_DESCRIPTION, DOUBLEDEPENDENTLOCALITY, DEPENDENTLOCALITY, LOCALITY, PROVINCE, POSTCODE, COUNTRY, COUNTRY_ISO3, PHONE, EMAIL, EMAIL_LOCAL, EMAIL_DOMAIN, GENERIC_STRING, DATE) |
|
includeFromNChars |
Only include keyed element in a blocking key if it is N or more characters in length Note: If specified, no blocking key is created when one or more of the configured elements do not meet this criteria. |
false |
integer (int32) |
|
truncateToNChars |
Truncate the keyed element to N characters in length. |
false |
integer (int32) |
BlockingKeyStatistic
Name | Description | Required | Schema | Default |
---|---|---|---|---|
blockingKeyId |
ID of the blocking key. |
false |
integer (int32) |
|
description |
Description of blocking key. |
false |
string |
|
maxBlockSize |
Maximum block size. |
false |
integer (int32) |
|
numThresholdExceptions |
Number of blocks whose size was greater than the threshold. |
false |
integer (int32) |
|
statisticsExcludingExceptions |
Blocking key statistics excluding exceptions. |
false |
||
statisticsIncludingExceptions |
Blocking key statistics including exceptions. |
false |
BlockingKeyStatistics
Name | Description | Required | Schema | Default |
---|---|---|---|---|
blockingKeyStatistics |
Statistics for the blocking key. |
false |
BlockingKeyStatistic array |
ClusterChange
Name | Description | Required | Schema | Default |
---|---|---|---|---|
clusterId |
false |
integer (int32) |
||
prevClusterId |
false |
integer (int32) |
||
recordId |
false |
ClusterModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
clusterId |
ID of the cluster. |
false |
integer (int32) |
|
recordsInCluster |
Information about the particular record in the cluster. |
false |
RecordId array |
ConnectionConfigModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
connectionSettings |
The required settings to connect to your specific datasource. |
true |
object |
|
connectionType |
The type of datasource to connect to. |
true |
enum (FLATFILE, JDBC, MONGO, JMS, REST) |
DataSourceConfigModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
connection |
The type of datasource to connect to and the required settings. |
true |
||
datasourceId |
Datasource ID returned by the API. This ID is returned by the API and not included in the request. |
false |
integer (int32) |
|
description |
Description of the datasource. |
false |
string |
|
fieldMappings |
The field mappings for your datasource. This includes field name and type. |
true |
FieldMappingModel array |
FieldMappingModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
field |
The field to be used for mapping. For flat files this will be column number, starting from 1. For databases this will be the column name. |
true |
string |
|
fieldGroup |
The field group name. Can be used to group related fields together. |
false |
string |
|
fieldType |
The type of the field. |
true |
enum (ID, NAME, TITLE, FORENAMES, SURNAME, ADDRESS, PREMISE_AND_STREET, LOCALITY, PROVINCE, POSTCODE, COUNTRY, GENERIC_STRING, DATE, PHONE, EMAIL) |
HttpEntity
Name | Description | Required | Schema | Default |
---|---|---|---|---|
body |
false |
object |
JobStatus
Name | Description | Required | Schema | Default |
---|---|---|---|---|
createTime |
false |
string (date-time) |
||
description |
false |
string |
||
finishTime |
false |
string (date-time) |
||
jobId |
false |
integer (int32) |
||
message |
false |
string |
||
progress |
false |
number (float) |
||
startTime |
false |
string (date-time) |
||
state |
false |
enum (PENDING, RUNNING, FINISHED, FAILED, CANCELLED) |
LicenseStatus
Name | Description | Required | Schema | Default |
---|---|---|---|---|
dllLocation |
false |
string |
||
isLicensed |
false |
boolean |
||
licenseExpiry |
false |
string (date) |
||
licenseFolder |
false |
string |
||
messages |
false |
string |
||
updateKey |
false |
string |
MatchJobRequest
Name | Description | Required | Schema | Default |
---|---|---|---|---|
callbackUri |
URI for a callback function. |
false |
string |
|
description |
Match job description. |
false |
string |
|
sessionId |
ID of the session to use for the match job. |
true |
integer (int32) |
MatchRequest
Name | Description | Required | Schema | Default |
---|---|---|---|---|
datasourceId |
The ID of the datasource. |
true |
integer (int32) |
|
recordId |
The ID of the record to add/update/delete. |
true |
string |
|
sessionId |
The ID of the session. |
true |
integer (int32) |
OutputConfigModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
connection |
The type of data output and the settings required to connect to it. |
true |
||
description |
Description of the output configuration. |
false |
string |
|
outputId |
Output configuration ID returned by the API. This ID is returned by the API and not included in the request. |
false |
integer (int32) |
|
outputMapping |
The output field mappings. Includes field names and types. |
true |
OutputFieldMappingModel array |
|
overwriteExisting |
Whether to overwrite existing records. Files will be overwritten. This setting is applicable to flat files only. For all other datasource types, datasource outputs will be appended to when writing any data. |
false |
boolean |
OutputFieldMappingModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
inputField |
The input field. This will be the field name (column number for flat files) and the ID of the datasource. Field name can also be a system field such as $RECORD_ID or $CLUSTER_ID. |
true |
SourceFieldSelector array |
|
outputField |
The field name or column number to output to. |
true |
string |
RealTimeRecordResponse
Name | Description | Required | Schema | Default |
---|---|---|---|---|
matchStatus |
The status of the match. |
false |
enum (EXACT, CLOSE, PROBABLE, POSSIBLE, NONE) |
|
record |
The record returned. |
false |
object |
RealTimeSearchResponseModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
results |
The results of a real-time search request. |
false |
RealTimeRecordResponse array |
RecordId
Name | Description | Required | Schema | Default |
---|---|---|---|---|
recId |
false |
string |
||
source |
false |
integer (int32) |
RuleConfigModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
description |
Description of the ruleset. |
false |
string |
|
ruleSetId |
Ruleset ID returned by the API. |
false |
integer (int32) |
|
ruleVersion |
Rule version. |
false |
enum (v1) |
|
rules |
A JSON-escaped string containing the rules. |
false |
string |
SearchModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
search |
Schema-less JSON object containing search criteria. The keys correspond to datasource input mappings. |
true |
object |
SearchResponseModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
results |
Returns an array of objects containing the records that match the search input. Each record object will contain the record fields from your output mapping along with a match status. |
false |
MapOfstringAndstring array |
SessionConfigModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
blockingKeyIds |
The IDs of the blocking key configuration to use with the session. |
true |
integer (int32) array |
|
datasourceIds |
The IDs of the datasource configuration to use with the session. |
true |
integer (int32) array |
|
description |
Description of the session. |
false |
string |
|
matchStoreConnection |
The match store connection type and settings. |
true |
||
outputId |
The ID of the output configuration to use with the session. |
true |
integer (int32) |
|
ruleSetId |
The ID of the rule set to use with the session. |
true |
integer (int32) |
|
sessionId |
Session ID returned by the API. This ID is returned by the API and not included in the request. |
false |
integer (int32) |
SourceFieldSelector
Name | Description | Required | Schema | Default |
---|---|---|---|---|
field |
The field name (column number for flat files). |
false |
string |
|
source |
The source ID. Set this to 0 if it is a system field such as $RECORD_ID or $CLUSTER_ID. |
false |
integer (int32) |
SystemStatus
Name | Description | Required | Schema | Default |
---|---|---|---|---|
licenseStatus |
false |
|||
warnings |
false |
string |
TransactionResultResponse
Name | Description | Required | Schema | Default |
---|---|---|---|---|
changes |
The changes made to the clusters. |
false |
ClusterChange array |
|
clusterId |
ID of the cluster. |
false |
integer (int32) |
|
newClusters |
Information about the new clusters. |
false |
ClusterModel array |
|
oldClusters |
Information about the old clusters. |
false |
ClusterModel array |
|
outcome |
The outcome of the transactional request. |
false |
enum (ADD_NEW, ADD_EXISTING, MERGE, SPLIT, DELETE_RECORD, DELETE_CLUSTER, COMPLEX, NO_CHANGE) |
|
recordId |
ID of the record. |
false |
||
recordsInCluster |
Information about the records in the cluster. |
false |
RecordId array |
TransactionalResultModel
Name | Description | Required | Schema | Default |
---|---|---|---|---|
results |
Results from a transactional request. |
false |