Overview

Experian Match is an innovative data matching and linkage solution, working alongside your enterprise technology and applications of choice to bring disparate identities together.

Experian Match supports the creation and maintenance of a single customer view:

Establish

  • Locate duplicates within existing systems

  • Establish linkage across data silos

  • Supplement decisioning and risk management with data enrichment

Maintain

  • Prevent creation of duplicates at source

  • Trap and cascade address changes

Real time search

  • 360-degree view of your customers

  • Improve real-time personalisation

  • Service subject access requests quickly

Concepts

Experian Match is powerful and highly configurable. To make the most of the product, we recommend you familiarise yourself with the core concepts.

Matching records

As the name suggests, the concept of matching records is the core principle behind Experian Match. Matching itself is the process of identifying which records are similar enough to be considered the same entity. This process involves standardising the input, blocking together similar records and comparing them based on configurable matching rules. The result of this is a collection of clusters containing matched records.

Match levels

Each match between two records can have one of four match levels. The default matching rules define the match levels as follows:

Exact

Each individual field that makes up the record matches exactly.

Close

Records might have some fields that match exactly and some fields that are very similar.

Probable

Records might have some fields that match exactly, some fields that are very similar and some fields that differ a little more.

Possible

Records contain the majority of fields that have a number of similarities but do not match exactly.

Data Configuration

Experian Match can connect and output to multiple data sources and data types. You will need to configure these connections as part of setting up your session.

The available data types are:

  • JDBC

  • Mongo

  • Flat file e.g. CSV

  • JMS

Data source

In order for Experian Match to start work on your data, you need to configure your data sources. Experian Match will need to know how to connect to your data source (including authentication information) and which data from the source to match on.

Match output

You will need to configure your data output if you want to output the results of your Matching jobs. Output configuration requires connection information to your database along with the fields you wish to output.

Experian Match provides four result fields that can be output alongside the source data which show the results of a matching job:

Field Field description

$CLUSTER_ID

The Cluster ID generated for the record by Matching

$MATCH_STATUS

The score associating the record to the cluster, e.g. Exact, Close, Probable, Possible, None

$SOURCE_ID

The configured Source ID from which a record originated

$RECORD_ID

The unique Record ID of the record as used in matching

Matching logic

Experian Match provides complete control over the matching logic and the way records are matched.

This can be configured using blocking keys and rules:

Blocking keys

Experian Match creates blocks of similar records to assist with the generation of suitable candidate record pairs for scoring. Blocks are created from records that have the same blocking key values. Blocking keys are created for each input record from combinations of the record’s elements that have been keyed. Keying is the process of encoding individual elements to the same representation so that they can be matched despite minor differences in spelling.

A default set of best practice blocking keys are provided with the software for name and address data. These can be acquired by a call to the REST API. To use these the user must modify them to match their input data and requirements then submit them via the REST API.

Rules

Before setting up your match session, a rule set must be configured. A rule set is a set of logical expressions (rules), written in our own Domain Specific Language (DSL), which control how records are compared and how match levels are decided. We have designed the rule DSL to give you complete control over how records match.

A default best practice rule set is provided with the software for name and address data. This can be adjusted for optimal matching depending on your data and requirements.

Match store

The match store is created and configured when you set up your session. When you run a match job, the relevant match store will be populated or updated depending on the session. The store contains the newly created clusters of records. Performing an output request will output the cluster IDs from the match store to your desired location.

Clusters

A cluster is a collection of records that have been identified as representing the same entity using the rules that you have provided.

Scenarios

Performing a match and output job

The steps below outline configuring a matching session, using this session to run a match job and outputting the clustered records. As part of this process a match store is established.

Each step will return an ID in the response, these will need to be used in subsequent steps to configure and perform a matching job.

The steps are:

  1. Create a data source configuration to set where your records are located

  2. Create an output configuration to set where you want the clustered records to be written to

  3. Create a rule configuration to decide how you want to match your records

  4. Create blocking key configurations.

  5. Create a session configuration to connect your configuration objects together

  6. Run a match job and output the results

Follow the match and output tutorial for a detailed run through, and example requests.

Perform maintenance on data in the match store

The steps below outline the required tasks maintain data in the match store. This might be to add, update or delete any of the records in a transactional manner. Maintenance operations on data in the match store enable bringing it into line with changes to the source data. The data source for the update must not be a flat file.

  1. Run a Match job following the Match and Output scenario to establish a match store that can be searched

  2. Add, update or delete a record from the match store. Each transactional update will re-cluster any affected match store records.

Once you’ve made your maintenance update, you can trigger actions in the matching system as the data in your data source changes.

Search the match store for a target record

The steps below outline the required tasks to search the match store to find records that could potentially match against your target record. This is particularly useful for checking whether similar records exist before entering into your database.

The steps are:

  1. Run a Match job following the Match and Output scenario to establish a match store that can be updated

  2. Search the built Match Store for a target record. The API will return a collection of records along with the Match status, allowing you to make a decision on what to do with your target record.

Follow the match store searching tutorial for a more detailed run through.

For any issues with running a matching job, check the troubleshooting section

Installing

Requirements

In order to deploy Experian Match you must deploy the API under an application server, install and configure the Standardisation service, and install your database drivers. The system requires access to a JDBC compliant database to store its index tables. You may use your existing infrastructure, or deploy a dedicated instance as you wish.

Software requirements

  1. Windows Server 2012 or greater

  2. .Net Framework 4.5

  3. Java JRE 8

  4. Application server capable of deploying a war file.

    • Apache Tomcat 8.5 is recommended

Hardware requirements

Application server

The matching system is highly multi-threaded and will benefit from running on a machine with multiple cores. Minimum and recommended hardware specifications are as follows.

Minimum:

  • 2 CPU Cores @ 2GHz+

  • 8GB RAM

  • 100MB HDD Space (to install Standardisation reference data for only 1 country)

Recommended:

  • 8 CPU Cores @ 2GHz+

  • 32GB RAM

  • 600MB Disk (to install Standardisation reference data for all countries)

Installing Standardize

Experian Match uses an external service GdqStandardizeServer to perform input standardisation. This service must be running for Experian Match to work correctly.

Setup

Prior to installing or starting the service, data should be installed.

  • Copy the Standardize directory to a location of your choosing, we recommended: C:\Program Files\Experian\Standardize.

Data installation

By default GdqStandardizeServer data should be stored in a folder Data within the Standardize install directory.

  • Extract the contents of the GDQStandardizeData zip file to a data folder in the install directory, e.g. C:\Program Files\Experian\Standardize\Data

See Advanced configuration for setting an alternative directory.

Installing or running the service.

Experian.Gdq.Standardize.Standalone.exe can be executed simply by running the executable. However, it is preferable to install it as a Windows Service.

  1. In an Administrator PowerShell prompt, navigate to the installation location.

  2. Run .\GdqStandardizeServiceManager.ps1 install. This will register the service in the Windows Services console.

  3. Run .\GdqStandardizeServiceManager.ps1 start. This will attempt to start the service.

  4. If you encounter any errors, check your configuration and try again. Note the service will not start if the data or license key is wrong. All errors are written to the system event log. Alternatively, it may be helpful to run the executable directly from the command line while troubleshooting as this will highlight the error.

Advanced configuration

The following parameters within the configuration file Experian.Gdq.Standardize.Standalone.config can be modified:

FilePath

Path to the data which GdqStandardizeServer requires to function.

Default: './Data'

hostIp

The IP Address which GdqStandardizeServer will listen for input from.

Default: 127.0.0.1 (i.e. localhost)

port

The Port which GdqStandardizeServer will listen for input from.

Default: 5000

defaultCountry

The default country which GdqStandardize will use when processing records.

Default: GBR

defaultCountryInfluence

The default level of influence which GdqStandardize will use when processing records. We recommend leaving this unchanged in most cases and if input records cover multiple countries. A higher value, for example 500 can be used to force Experian Match to treat all input records as from the defaultCountry.

Default: 50

Override alias to rootname mappings

It is possible to override the file containing the alias to rootname mappings using the property "standardisation.rootname.file.path".

Matching standardisation port

The Matching product is configured to use GdqStandardizeServer on the default port 127.0.0.1:5000. To change this, edit the application.properties file in the deployment directory, and add two properties: standardisation.host, and standardisation.port set to the required value.

Country processing

When standardising records, GdqStandardize needs to know what country the data in the record is referring to, in order to derive more information. Proper, country-specific standardisation affects the rest of the Matching process, as it changes how potential matches are found.

The defaultCountry configuration setting influences which country the standardisation system will assume the record is from. However, this influence can be overridden on a per-record basis by specifying an ISO 3166-1 alpha-3 code in the input data. The data should be mapped to the COUNTRY data type.

Common ISO codes:

Country

ISO 3166-1 alpha-3

United Kingdom

GBR

United States

USA

Austrailia

AUS

France

FRA

Deploying

Experian Match REST API must be deployed under an application server. Instructions below are given for Apache Tomcat.

Install the latest stable version of Tomcat (currently 8.5) according to the Apache installation instructions.

Experian Match REST API is deployed like any other web application by copying the supplied war file to the CATALINA_HOME\webapps directory.

To check that your deployment was successful, navigate to http://localhost:{port}/matching-rest-api-{VersionNumber}/swagger-ui.html. The default Tomcat port is 8080.

Persistent configuration

By default, configurations created via the REST API are not saved to disk. As such, when the application server shuts down (or is redeployed) any configurations are lost, and must be recreated on start-up. To enable persistent configuration, set a path using the matching.configLocation property.

Memory tuning

Ensure that you have as much memory allocated to the Tomcat JVM as possible. The maximum heap size should be set as high as possible while allowing sufficient memory for the operating system and any other running processes using the -Xmx setting. A maximum heap size of at least 8GB is recommended.

export CATALINA_OPTS="$CATALINA_OPTS -Xms1g -Xmx12g"

Database drivers

If you are planning to connect to a SQL database you will need to make sure that you have installed the relevant JDBC database drivers.

Note: If your configurations only include Flat Files, Mongo, or HSQL, the following steps are not required and can be skipped. Installation of the SQL drivers can be done at a later date by following the steps and restarting the application.

To install the SQL Server drivers:

  1. Download the .exe file from https://www.microsoft.com/en-us/download/details.aspx?id=11774. The .exe file is an archive that you will need to unzip.

  2. Unzip this to a location of your choice and navigate to this location.

  3. Copy the sqljdbc42.jar file located in {EXTRACT LOCATION}\sqljdbc_6.0\enu\jre8 to your CATALINA_HOME\lib directory

To install the Oracle drivers:

  1. Download the ojdbc6.jar file from http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html

  2. Copy the ojdbc6.jar file to your CATALINA_HOME\lib directory.

You will need to supply the relevant driver name when making an API request that includes the connectionSettings object, you can find more information on configuring JDBC connections here:

  • SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver

  • Oracle: oracle.jdbc.driver.OracleDriver

Licensing

Experian Match must be deployed and correctly licensed before Match jobs can be run.

Ensure the steps in Deploying are complete. During deployment Experian Match generates number of licensing files, see Advanced license configuration for further details.

  1. Navigate to http://localhost:{port}/matching-rest-api-{VersionNumber}/swagger-ui.html#!/System/getSystemStatus. Get the SystemStatus to retrieve the LicenceStatus object of the form:

    {
      "warnings": "",
      "licenseStatus": {
        "isLicensed": false,
        "licenseExpiry": null,
        "updateKey": "1ABC4EF6G9HIJKLMPBJJ61",
        "licenseFolder": "C:\\ProgramData\\Experian",
        "dllLocation": "C:\\Users\\username\\AppData\\Local\\Temp\\Experian-Licensing-8028610556869156823",
        "messages": "License not found."
      }
    }
  2. Your Experian support representative will request the updateKey and generate an updateCode for you to apply to Experian Match.

  3. Apply the updateCode to Experian Match via the endpoint http://localhost:{port}/matching-rest-api-{VersionNumber}/swagger-ui.html#!/System/applyUpdateCode. Experian Match will return the new LicenceStatus containing the license state and expiry information. Match jobs can now be run.

Advanced license configuration

By default Experian Match generates and stores licensing information within C:\ProgramData\Experian. Licensing DLLs are copied to a temporary directory on deployment of the application. Both of these locations can be overridden using configuration properties if required.

If configuration properties have been set, Experian Match expects the requisite files to exist at that location:

  • matching.licensing.licensefolder

    • edqlic.ini - the softkey directory file (initially empty).

    • edqlickeys.ini - the file containing the base license key.

  • matching.licensing.dllpath

    • EDQLicED.dll or EDQLicGD.dll for 32bit and 64bit operation respectively.

Configuring

Matching has the following configuration properties:

Name

Type

Default

Description

matching.configLocation

java.lang.String

Configuration will not be persisted

A directory where configuration settings will be persisted

standardisation.rootname.file.path

java.lang.String

The internal root name file will be used

The location of an alias to rootName file

matching.licensing.licensefolder

java.lang.String

C:\ProgramData\Experian

Path to the directory containing the license information

matching.licensing.dllpath

java.lang.String

Temp folder e.g. %CATALINA_BASE%\temp\Experian-Licensing-<random>

Path to the directory containing licence DLLs (EDQLicGD.dll, EDQLicED.dll)

matchstore.purge

java.lang.String

false - Match Stores will be persisted by default

Whether to purge the Match Store once the job has completed

Adding configuration using Tomcat

  1. Create an xml file under CATALINA_HOME\conf\Catalina\localhost\. This file should have the same name as the deployed WAR file. If the WAR has been deployed under ROOT, the configuration file should be called root.xml.

  2. Add the required configuration as an Environment property in the <Context> block.

For example:

<Context>
    <Environment name="matching.configLocation" value="c:\Experian\Match\config" type="java.lang.String" override="true" />
    <Environment name="standardisation.rootname.file.path" value="c:\Experian\Match\rootNames.txt" type="java.lang.String" override="true" />
</Context>

A Tomcat restart is required to load the settings after this file is created or modified.

matching.configLocation

It is recommended that the save location is set to a directory outside the Tomcat installation directory, as this is often a volatile directory, and not suitable for user data.

This directory should be included as part of a standard backup policy.

standardisation.rootname.file.path

The file should contain no header and only two columns separated by a comma:

  1. Alias - The first column contains the name aliases. These should be unique. If an alias appears more than once, only the last entry that appears in the file will be used during Standardisation.

  2. RootName - The second column contains the root name, which the alias maps to. The same root name can appear multiple times in this column.

An example of how this file should look is as follows:

ABBY,ABIGAIL
ABDLE,ABDUL
ABDOU,ABDUL
ABDUH,ABDUL
ABY,ABIGAIL

Tomcat must be restarted after this file is created or modified.

Persistent configuration

By default, configurations created via the REST API are not saved to disk. As such, when the application server shuts down (or is redeployed) any configurations are lost, and must be recreated on start-up. To enable persistent configuration, set a path using the matching.configLocation property.

Datasources

If you are connecting to a database you will need to supply credentials for a database user. This is in order for the system to read from the data source containing the records you wish to match, and to write the output results. The user must have db_reader permission on the data source, and db_writer permission on the output table.

A different database user is also required to manage the system’s index tables. This user must have db_owner, db_reader, and db_writer permissions as it will create/drop its index tables, and must be able to read/write to/from them.

The valid settings for connecting to data sources, output sinks, or index stores are shown below.

FLATFILE

Values with a (F) or (D) are valid only for either Fixed Width or Delimited and cannot be mixed.

{
    "Path" : "<path to file accessible by matching system>",
    "Header" : "<whether the first line contains headers (true|false)>",
    "LineEnding" : "<string that indicates a newline>",
    "Delimiter" :  "<delimiter char (D)>",
    "Quote" : "<character to quote strings that contain the delimiter: should only be " or ' and defaults to " (D)>"
    "ColumnWidths" : "<comma separated list of column widths (F)>",
    "PaddingChar" : "<character with which to pad columns in fixed-width files (F)>",
}

Search and transactional updates are not supported with flat file data sources.

Note that flat file input size is limited by the amount of memory allocated to the JVM. This is because all input fields from a flat file source are loaded into memory at the start of the match job. For large input files it is recommended that the JVM maximum heap size be set to at least 8GB.

JDBC

{
    "JdbcUrl" : "<a valid JDBC URL>",
    "JdbcDriver": "<the JDBC driver to use>",
    "Table" : "<the name of the table to use>",
    "UserName" : "<the optional username to connect with>",
    "Password" : "<the optional password for the username>",
    "HashBlockingKeys": "true|false"
}

HashBlockingKeys is a Match Store setting only. See Hashing under Tuning in the documentation for more information on this.

HSQL Data Source

The Experian Match .war file contains support for HSQL database match stores. HSQL is a simple, disk backed, in-memory relational database management system.

Note that due to the limitations of memory stores, an HSQL database is not recommended for production deployments.

Experian Match will generate a HSQL database on a local file path when configured as a JDBC connection as follows:

{
    "JdbcUrl" : "jdbc:hsqldb:file:<escaped path to generated database>",
    "JdbcDriver": "org.hsqldb.jdbc.JDBCDriver",
    "UserName" : "sa",
    "Password" : ""
    "Table" : "<the name of the table to use>",
}

A relative database path such as match\\hsql.db will store the database relative to your web server. Alternatively an absolute path such as c:\\temp\\experianmatch\\hsql.db can be used.

On shutdown of Experian Match, the database will be persisted to disk at the location defined.

  • For JDBC connections, 3 tables will be created:

    • <table_name> as specified in the connectionSettings (e.g. Cust_Index in the above example), storing record information.

    • <table_name>_KEYS (e.g. Cust_Index_KEYS in the above example), storing blocking keys for each record.

    • <table_name>_CLSTRS (e.g. Cust_Index_CLSTRS in the above example), storing cluster information.

    • <table_name>_SCORE_PAIRS (e.g. Cust_Index_SCORE_PAIRS in the above example), storing results of record comparison and rule evaluation.

JDBC Data Source Tuning

Experian Match utilises HikariCP connection pooling which can be configured external to Experian Match through the addition of a properties file named hikari.properties in the web applications resource directory. Experian Match is configured to use the default settings and additional tuning may be necessary based on the JDBC data source used. See Hikari configuration settings for full details and suggestions for configuring connectionTimeout, maximumPoolSize, and dataSource.*.

MONGO

{
    "MongoUrl" : "<the URL of the MongoDB instance>",
    "MongoPort" : "<the port to use when connecting to the MongoDB instance>",
    "Database" : "<the database to use>",
    "Collection" : "<the collection to use>",
    "UserName" : "<the optional username to connect with>",
    "Password" : "<the optional password for the username>"
}

Note that the maximum size of the Mongo source collection is limited by the amount of memory allocated to the JVM, since all source fields are loaded into memory at the start of the match job. For large source collections it is recommended that the JVM maximum heap size be set to at least 8GB.

JMS

{
    "JmsQueueName" : "<the JMS queue name>",
    "BrokerURL" : "<the URL of the broker to use>",
    "UserName" : "<the optional username to connect with>",
    "Password" : "<the optional password for the username>",
    "EndOfQueueString" : "<the string to signify the end of queue, default value 'EOF'>",
    "TimeoutMS" : "<the timeout for reading from the JMS queue in milliseconds, default value 5000>"
}

Search and transactional updates are not supported with JMS data sources.

If JMS is used as an input data source then fields from this source cannot be referenced in the output mapping. If JMS is used as the output source it will only output the results of the matching jobs. No input source fields will be included.

A unique end of queue string which does not occur in the input data source should be supplied to signify the end of a JMS queue. It is advisable to use different JMS queue names for connecting to the JMS data source and and the JMS output source.

Tuning

Rules

Concepts

In order to create your own rule set or modify the default rule set it is important to understand the concepts.

Rules take the following form: <rule reference>=<expression>

A rule reference consists of a rule name followed by a ‘.’ followed by a match level.

An expression may take multiple forms. It is either a low-level expression operating on the elements within a record, or a higher-level expression composed of references to other rules.

A rule set is made up of a combination of three rule types, the three rule types increase in specificity from Match to Theme to Element. Match and Theme rules are comprised of references to rules from the level below. Element rules are comprised of rules set on specific data elements, for example a postcode or building number.

We can visualise this using a tree diagram:

rulestree
Syntax
  • All rules have a rule reference on the left hand side (LHS)

    • Rule reference = <rule name>.<match level>

      • Rule name = ‘Match’ (match rule), custom identifier (theme/multi-element rule), or element (single element rule). Custom identifiers must begin with an alphabetical character and can consist of numerical characters, underscores and hyphens.

      • Match level = Exact, Close, Probable, Possible

  • The right hand side (RHS) is always surrounded by {}

  • The RHS may include logical operators & (and) and | (or).

    • Expressions may be nested and logical operators combined (parentheses are required) e.g. MyRule.Probable = {((RuleA.Possible & RuleB.Probable) | (RuleA.Probable & RuleB.Possible)) | RuleC.Exact}

  • Element rules include the element and the allowed result set (enclosed in [], comma separated) and may also include an optional element modifier and/or comparator.

  • Any theme or element rule may also optionally include a field group from the input field mappings, which is identified using a hash symbol before the field group name e.g. #MyFieldGroup.PostcodeTheme.Exact = {Postcode[ExactMatch]}. See the Field Groups section below for further information.

Match levels

There are four match levels that can be used within each rule specification. These are Exact, Close, Probable and Possible. When working with match levels, note that:

  • Match levels will be evaluated in order from Exact through to Possible, stopping at the first level that passes.

  • Records will be considered a match, and be clustered together, if any of the defined top level overall rule match levels evaluate to true.

  • Every rule must include a match level as part of the rule reference (LHS).

Rule evaluation

Rules are evaluated as follows:

  1. Match rules first - the Match.<MATCHLEVEL>

  2. Left to right

  3. Higher order rule matches mean that lower match levels are not required to be evaluated.

  4. Evaluated lazily

Using rules with Experian Match

Rule sets are supplied as a JSON escaped string when making a rules configuration request. JSON has a number of reserved characters, such as line breaks that need to be escaped when supplying a string as part of a request. This involves replacing these characters with the relevant escape characters e.g. \n for a line break.

Long JSON escaped strings, such as a matching rule set, are not very human readable. We recommend that you make changes to your rules before escaping the string. There are a number of free online tools that will escape JSON for you.

The default rule sets for each supported country can be found here:

Rule types

There are three rule types that can be defined within the rules: Match, Theme and Element (single element, multi element, generic string and date).

Match rule

This is the highest level of rule, defining an overall match between two records. A match rule is made up of references to other rules. The rule name must always be ‘Match.<Match Level>’.

At least one match rule must be defined for a successful matching job.

Example: Match.Exact={Name.Exact & Address.Exact} (Name.Exact and Address.Exact have been defined separately)

Rule references can be combined into compound logical expressions. In this way, you have complete control over the logic used to determine matches.

Example: Match.Close={(Name.Exact & Address.Exact) | (Name.Exact & Email.Exact & Phone.Exact)}

Theme rule

This is the next level down. Much like a match rule, a theme rule is made up of references to other rules. The rule name must begin with an alphabetical character and can consist of numerical characters, underscores and hyphens, excluding ‘Match’ and the set of reserved elements (explained further below).

Example: Address.Exact={Premise.Exact & Street.Exact & Locality.Exact & Postcode.Exact}

In a theme rule, the rule references within the expression (i.e. on the RHS) could either be further theme rules or low-level element rules.

Element rule

Element rules are the most granular of rules. They can be used to specify how to compare individual elements within a record. Elements are basic units of data that comprise an overall theme. For example a postcode or premise would be examples of address elements.

Single element rule

This is the simplest type of element rule, operating on a single element.

Example: Title.Exact={[ExactMatch,NonePopulated]}

The element to compare (Title) is specified on the LHS. On the RHS, we have a set of allowed results. For this rule to evaluate to true – i.e. for us to consider a Title match to be Exact – the Title elements in two records must either be an exact match (i.e. the strings must match exactly) or both be blank (NonePopulated).

We can also specify element modifiers and custom comparators.

Example: Title.Close = {StandardAbbreviation.Levenshtein[90%,NonePopulated]}

Here, the element modifier is StandardAbbreviation – meaning the standardised form of the title (e.g. Mister → Mr.) and the comparator is Levenshtein (an approximate string matching algorithm). This rule evaluates to true if the two StandardAbbreviation forms of the Title element are >=90% similar or if they’re both blank.

Where no element modifier is supplied, the unmodified element will be used.

Where no comparator is supplied, the default will be used – this does exact string matching, and may return ExactMatch, OnePopulated, NonePopulated, NoMatch.

If you wish to use multiple element modifiers and/or comparators in combination, it is possible to build up more complex rules.

Example: Forenames.Close={JaroWinkler[90%] | RootName[ExactMatch]}

This means, “for forenames to match closely, either the unmodified forenames element must match >=90% using the Jaro-Winkler approximate string matching algorithm, or the root name derived from the forename must match exactly” (this latter part allows Bill and William to match for example).

Multi-Element Rule

A multi-element rule allows you to compare multiple elements within a single rule.

Example: SubBuilding.Exact={SubBuilding_Number[ExactMatch,NonePopulated] & SubBuilding_Description[ExactMatch,NonePopulated] & SubBuilding_Type[ExactMatch,NonePopulated]}

This compares 3 elements – SubBuilding_Number, SubBuilding_Description, SubBuilding_Type, using no element modifier and the default comparator in all cases. Here, the element type must be specified on the RHS, and the rule name on the left is a custom identifier.

Note that it would be possible to achieve the same result using 3 single element rules and a theme rule to combine them, however the above is preferred for brevity/readability.

Field Groups

Any element can be used multiple times in input to represent separate/unique sets of information. For example, the input may include:

  • multiple street numbers, street names and postcodes e.g. a delivery address and a billing address

  • multiple forenames and surnames e.g. a primary account holder and a spouse

  • multiple generic strings e.g. an account reference and a customer reference

  • multiple dates e.g. a date of birth and a registration date

  • multiple phone numbers e.g. a mobile number and a home number

  • multiple email addresses e.g. a personal address and a work address

It may be desirable to handle these separately during rule evaluation e.g. have a more strict rule for a billing address but a more flexible rule for the delivery address.

To support this, any element can be mapped as part of a separate field group, and the field group may be specified as part of the rule expression. This is achieved by prefixing the rule with a hash symbol, followed by the field group name, followed by a full stop, before then writing the rest of the rule e.g. #FieldGroupName.Name.Exact=.

There are a number of places where the field group can be used:

  • on the LHS within an element rule e.g. #Parent.Surname.Exact={[ExactMatch]} - in this example, the exact rule on the surname element only applies to the surname in the 'Parent' field group

  • on the LHS within a theme/match rule e.g. #Delivery.Address.Exact={Address.Exact} - in this example, the exact address theme rule only applies to address elements contained in the 'Delivery' address field group

  • on the RHS within an element rule e.g. Title.Exact={#Parent.Title[ExactMatch,NonePopulated]} - in this example, the exact rule on the title element only applies to the title in the 'Parent' field group

  • on the RHS within a theme/match rule e.g. Match.Exact={#Billing.Address.Exact & #Delivery.Address.Exact & Name.Exact} - in this example, the exact match rule requires the name to be exact as well as the 'Billing' and 'Delivery' address field groups to be exact

Note that the field group identifier cannot be used on both sides of the rule expression. For example, #AccountID.Close={#AccountID.GenericString[ExactMatch]} is not a valid expression.

A field group name must begin with an alphabetical character and can consist of numerical characters, underscores and hyphens. A field group name cannot include a hash symbol itself e.g. #Account#ID.Exact will not be accepted.

Match.Exact={#Delivery.Address.Exact & #Billing.Address.Exact & GenericStringCombined.Exact}

#Billing.Address.Exact={StreetNumber.Exact & PostcodeTheme.Exact} #Delivery.Address.Exact={Address.Exact}

Address.Exact={PostcodeTheme.Exact} StreetNumber.Exact={MinorStreet_Number.PremiseCompare[ExactMatch]} PostcodeTheme.Exact={Postcode.PostcodeCompare[Part1Match] & Postcode.PostcodeCompare[Part2Match]}

GenericStringCombined.Exact={#UserId.Generic_String[ExactMatch] & #MembershipID.Generic_String.JaroWinkler[95%]}

The example above shows how the field groups can be used at all levels of the rule hierarchy. The #Billing address group requires the street and postcode within that group to be exact matches, while the #Delivery address group only requires the postcode within that group to be exact. Additionally, the generic string element rule evaluates the two different generic strings from the different field groups in two different ways.

Elements

These are the available elements and the comparators that can be used with them.

Element Name Element Description Example Available Comparators Available ElementModifiers

Title

Title

Mrs

ExactString

Default, StandardSpelling, StandardAbbreviation

Forenames

Given name/names and any initials

John

ExactString, ForenameCompare, Levenshtein, JaroWinkler

Default, RootName

Surname_Prefix

Surname prefix

De la

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

Surname

Surname with prefix

Smith

ExactString, Levenshtein, JaroWinkler

Default

Surname_Suffix

Surname suffixes

Junior

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

Gender

Gender

Female

ExactString, Levenshtein, JaroWinkler

Default

Honorifics

Honorifics

Ph.D

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

Building_Description

Building name and type

George West House

ExactString, Levenshtein, JaroWinkler

Default

Building_Number

Building number

43

ExactString, PremiseCompare

Default

SubBuilding_Number

Sub-building number

2

ExactString, PremiseCompare

Default

SubBuilding_Description

Sub-building name

First-floor

ExactString, Levenshtein, JaroWinkler

Default

SubBuilding_Type

Sub-building type

Flat

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

MinorStreet_Number

Street number

34th

ExactString, PremiseCompare

Default

MinorStreet_Predirectional

Street pre-directional

South

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

MinorStreet_Description

Street name

Johnston

ExactString, Levenshtein, JaroWinkler

Default

MinorStreet_Type

Street descriptor

Street

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

MinorStreet_Postdirectional

Street post-directional

South

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

PoBox_Number

PO Box number

79

ExactString, Levenshtein, JaroWinkler

Default

PoBox_Description

PO Box description

PO Box

ExactString, Levenshtein, JaroWinkler

Default, StandardSpelling, StandardAbbreviation

DoubleDependentLocality

A small locality such as a village, used to identify an address where a street appears more than once in a dependent locality

Kingston Gorse

ExactString, Levenshtein, JaroWinkler

Default/Derived, StandardSpelling

DependentLocality

Smaller locality used to identify an address where a street appears more than once in a locality

East Preston

ExactString, Levenshtein, JaroWinkler

Default/Derived, StandardSpelling

Locality

A larger locality such as a town or city

Cambridge

ExactString, Levenshtein, JaroWinkler

Default/Derived, StandardSpelling

Province

A larger area of a country, contains multiple localities

Cambridgeshire

ExactString, Levenshtein, JaroWinkler

Default/Derived, StandardSpelling, StandardAbbreviation

Country

Country name

United Kingdom

ExactString, Levenshtein, JaroWinkler

Default

Postcode

Postal code

SW4 0QL

ExactString, PostcodeCompare

Default/Derived, Levenshtein, StandardSpelling

Generic_String

Generic string

ab-1234cdef

ExactString, Levenshtein, JaroWinkler

Default

Date

ISO Date in the format YYYY-MM-DD

1980-06-21

ExactString, DateCompare

Default

Phone

Phone number

(01234) 567890

ExactString, Levenshtein, JaroWinkler

Default

Email

Email address

john.smith@domain.com

ExactString, Levenshtein, JaroWinkler

Default

Email_Local

Local part of email address

john.smith

ExactString, Levenshtein, JaroWinkler

Default

Email_Domain

Email domain

domain.com

ExactString, Levenshtein, JaroWinkler

Default

Comparators

The following comparators can be used. The results available for the default comparator (ExactString) will also be available for all other comparators.

Comparator Results

ExactString (default comparator)

ExactMatch: Strings match exactly e.g. "John Smith" & "John Smith"

OnePopulated: The field is populated for one of the records e.g. "John Smith" & ""

NonePopulated: The field is not populated for either of the records e.g "" & ""

NoMatch: The strings are both populated by are not an exact match e.g. "John Smith" & "John Doe"

ForenameCompare

InitialVsFullName: An initial or initial matches to the full name e.g "S J" & "Sarah Jane"

Plus all ExactString (default comparator) results.

PremiseCompare

StartMatch: Premise matches the start of a premise range e.g. "12" & "12-15"

StartMatchAndEncapsulated: Premise ranges match at the start and one encapsulates the other e.g. "12-15" & "12-16"

EndMatch: Premise matches the end of a premise range e.g. "15" & "12-15"

EndMatchAndEncapsulated: Premise ranges match at the end and one encapsulates the other e.g. "13-16" & "12-16"

Encapsulated: Premise or premise range is encapsulated by the other e.g. "12" & "11-16"

Overlapped: Premise ranges overlap each other e.g. "12-15" & "14-18"

NumberMatchWithTrailingAlpha: Premise numbers match and one record has a trailing alpha e.g. "12" & "12a"

NumberMatchWithDifferingAlpha: Premise numbers are an exact match but trailing alpha is different e.g. "12a" & "12b"

Plus all ExactString (default comparator) results.

DateCompare

DayMonthReversed: Matches dates where the day and month are reversed eg "2017-06-03" & "2017-03-06"

MonthYearMatch: Matches dates where only the month and year match eg "2017-06-03" & "2017-06-04"

DayMonthMatch: Matches dates where only the day and month match eg "2017-06-03" & "2016-06-03"

DayYearMatch: Matches dates where only the day and year match eg "2017-06-03" & "2017-07-03"

YearMatch: Matches dates where only the year matches eg "2017-06-03" & "2017-07-04"

Plus all ExactString (default comparator) results.

PostcodeCompare

Part1Match: Records match to the first part of the postcode e.g. "HA2 9PP" & "HA2 5QR"

Part2Match: Records match to the second part of the postcode e.g. SM1 9PP" & "HA2 9PP"

Plus all ExactString (default comparator) results.

Levenshtein

Depending upon specified comparison type, either:
<Minimum %>: The minimum Levenshtein percentage to provide a match (integer between 0-100) e.g. setting the LevenshteinPercent result to 90 would return a match for "John Smith" & "Joan Smith"
Or:
<Maximum distance>: The maximum Levenshtein distance to provide a match (integer). e.g. setting the LevenshteinDistance result to 1 would return a match for "John Smith" & "Joan Smith"

_Plus all ExactString (default comparator) results.

JaroWinkler

<Minimum %>: The minimum Jaro-Winkler distance percentage to provide a match (integer between 0-100). e.g. setting the JaroWinkler result to 95 would return a match for "John Smith" & "Joan Smith"

Plus all ExactString (default comparator) results.

Element modifiers

The Match REST API is able to identify and correct many known terms that may appear in the input record. A selection of element modifier keywords can be used to retrieve modified versions of the input elements within rule definition.

<rule name>.<match level>={<element>.<element modifier(optional)>.<comparator(optional)>[comparator results] (&...)}
ElementModifier Operation

(Default)

The element classified from the input in a cleaned form.

StandardSpelling

The element converted to a standard spelling (contains Derived value when available).

StandardAbbreviation

The element converted to the standard abbreviation.

Derived

A derived value that was inferred from other information in the input address.

Example:

MinorStreet_Type.Exact={MinorStreet_Type.StandardSpelling[ExactMatch]}

Examples

Initial vs full name
Forenames Surname

Record 1

Robert

Brooke

Record 2

R

Brooke

Name.Probable = {Forenames.ForenameCompare[InitialVsFullName] & Surname[ExactMatch]}

Minor street number
MinorStreet_Number MinorStreet_Description MinorStreet_Type

Record 1

123

Burnthouse

Lane

Record 2

123a

Burnthouse

Lane

StreetAddress.Close = {MinorStreet_Number.PremiseCompare[NumberMatchWithTrailingAlpha] & MinorStreet_Description[ExactMatch] & MinorStreet_Type.StandardAbbreviation[ExactMatch]}

Postcode
MinorStreet_Description MinorStreet_Type Locality Postcode

Record 1

Hints

Road

Tamworth

B78 3AB

Record 2

Hints

Road

Tamworth

B78 3AT

Address.Probable = {Building_Number[ExactMatch] & MinorStreet_Description[ExactMatch] & Locality[ExactMatch] & Postcode.PostcodeCompare[Part1Match]}

Default Rules

Download text file containing the JSON escaped default rules string for the United Kingdom.

Download text file containing the JSON escaped default rules string for Australia.

Blocking key configuration

To be effective, blocking keys should represent a range of contact data sub-element combinations. Experian Match can provide default blocking keys tuned for Name and Address matching. These blocking keys may need modifying to suit the input data and use case.

The default blocking keys for a country can be obtained using the GET request /v2/configuration/blockingKey/default/{countryISO3} and specifying a country ISO3 code. Blocking keys are currently included for the United Kingdom(GBR) and Australia(AUS).

Each blocking key is defined by a BlockingKeyConfigModel object, and the default response is an array containing a list of these key specifications.

Blocking keys can be added to Experian Match by a POST to /v2/configuration/blockingKey containing an array of BlockingKeyConfigModel objects. Each key is allocated an ID to use when creating the session configuration.

Blocking keys are added to a session configuration as blockingKeyId array. A search session should contain all, or a subset, of the blocking keys from the session which created the match store. This is because we will only search on blocking keys that have been created during the match job.

Note: If the blocking keys are updated on the session then the match job will need to be re-run to update the blocking keys in the match store.

All of the Elements that are available in the rules can also be mapped in a BlockingKeyConfigModel. Keyed forms of these elements are combined to form blocking keys. The Blocking Key Element Algorithms table lists the available keying algorithms.

Note: Element Names, Element Modifiers and BlockingKeyElementAlgorithm Names specified in the blocking key specifications must be all upper case.

For example to use the MinorStreet_Number in a key it must be specified as MINORSTREET_NUMBER.

Blocking Key Element Algorithms

The keying of each element is defined by a BlockingKeyElementAlgorithm object. If no BlockingKeyElementAlgorithm is specified the SIMPLIFIED_STRING algorithm is used.

name Description Keyed Example properties

NO_CHANGE

No modification - retains spaces

"ANDREW J" ⇒ "ANDREW J"

SIMPLIFIED_STRING

Remove spaces

"ANDREW J" ⇒ "ANDREWJ"

DOUBLE_METAPHONE

Double metaphone part 1

"ANDREW J" ⇒ "ANTR"

DOUBLE_METAPHONE_FIRST_WORD

Double metaphone part 1

"ANDREW J" ⇒ "ANTR"

NYSIIS

Nysiis

"ANDREW J" ⇒ "ANDRAJ"

SOUNDEX

Soundex

"ANDREW J" ⇒ "A536"

CONSONANT

Only consonants

"ANDREW J" ⇒ "NDRWJ"

INITIAL

Initial value

"ANDREW J" ⇒ "A"

START_SUBSTRING

Substring from beginning

"ANDREW J"("length":3) ⇒ "AND"

"length": <integer>

MIDDLE_SUBSTRING

Substring from start to end

"ANDREW J"("start":2,end:5) ⇒ "NDRE"

"start": <integer>, "end": <integer>

END_SUBSTRING

Substring from end

"ANDREW J"("length":3) ⇒ "W J"

"length": <integer>

The CONSONANT and SOUNDEX key types support the following character sets: Basic Latin (ASCII), Latin-1 Supplement, Latin Extended-A, Latin Extended Additional.

All other key types have been tested with the following Latin character sets: Basic Latin (ASCII), Latin-1 Supplement, Latin Extended-A, Latin Extended-B, Latin Extended-C, Latin Extended-D, Latin Extended-E, Latin Extended Additional, IPA Extensions, Phonetic Extensions, Phonetic Extensions Supplement.

Hashing

Blocking Keys are hashed when written to the match store. This obfuscates them for security; in cases where the match store is encrypted, the blocking key value must remain unencrypted. Hashing also improves performance of blocking; because hashing produces values of the same length, the database is able to create a more efficient index over the blocking keys.

Note: When using a relational database as a match store, hashing blocking keys will affect the type of the blocking key value column. With hashing enabled the column will be CHAR(40), when hashing is disabled it will be VARCHAR(255). When running a job with hashing enabled, it is not possible to run a job with hashing disabled against the same match store. The match store tables will have to be dropped before running a job with hashing disabled.

Hashing has no impact on Search or Transactional operations.

Output

To improve system performance, it is recommended that only the match result fields (eg. $SOURCE_ID, $RECORD_ID, $CLUSTER_ID and $MATCH_STATUS) are configured in the output mapping.

Datasource field mapping

Field mapping refers to the database fields you would like to use for matching. The available field types that you can match on are listed below:

Field Field description

ID

ID of the input record. This must be unique per source.

NAME

Any name element. For best results ensure NAME fields are ordered as title → forename → surname. Cannot be used in combination with individual Name components TITLE, FORENAMES or SURNAME.

TITLE

The title element of a full name. Cannot be used in combination with any NAME fields.

FORENAMES

The given name/names a full name. Cannot be used in combination with any NAME fields.

SURNAME

The given surname/names a full name. Cannot be used in combination with any NAME fields.

ADDRESS

Any Address element. For best results ensure input ADDRESS fields follow the standard order for that country.

PREMISE_AND_STREET

Fields that contain premise and street elements such as premise, building or po box number and a street or building name.

LOCALITY

Localities such as a town or city.

PROVINCE

Larger area of a country such as a county which contains multiple localities.

POSTCODE

Postal code.

COUNTRY

Country name, either as a full name or ISO code. To enable country processing an ISO 3166-1 alpha-3 country code must be specified.

GENERIC_STRING

A generic String element.

DATE

A simple date element. It is recommended that the data is cleansed before running a Match job. It must be in the ISO date format YYYY-MM-DD. Any subsequent characters will be removed. Please be aware that the default epoch date 1970-01-01 may match with other records that have the same date.

PHONE

Phone number.

EMAIL

Email address.

Troubleshooting

  • If your matching and output job is successful but your output data is empty or incomplete, this may mean your input fields don’t match with your data.

    • When using a FLATFILE for input, check that your input field numbers match correctly to the relevant column (column numbers start at 1).

    • If writing to a JDBC database, ensure the output table has been created with the correct columns.

  • If your matching job fails with the message [Standardisation Client] Failed to connect to server, this means matching cannot connect to GdqStandardize. Ensure the service is setup and running correctly, by following the GDQStandardize Setup

Logging

When run under Tomcat the matching.log file can be found in CATALINA_HOME\logs\matching.log. Logging is handled by the log4j framework. The logging behaviour can be changed by updating the deployed log4j2.xml, as described in the following sections.

Log levels

The log level is specified for each major component of Matching within its own section of the log4j2 configuration under the XML section '<Loggers>'. For example:

<Logger name="com.edq.matching.components.scoring" level="WARNING" additivity="false">
        <AppenderRef ref="MatchingScoringLog"/>
</Logger>

This specifies that the Scoring component would have a log level of 'WARNING' - this is the recommended default for all components. Each component can have the logging level increased or decreased to change the granularity in the log file.

The components of Experian Match that may be individually configured are:

Component Description

com.edq.matching

The overall application; the level set here is the default to be applied if none of the below are configured.

com.edq.matching.components

The main application components. This would give information regarding the processing pipeline of keying, blocking, scoring and clustering.

com.edq.matching.dataconnector

The components that connect to data sources and sinks (databases, files, etc.). This would give information regarding the status of connections to the data endpoints such as whether a particular file exists and is accessible.

com.edq.matching.api

The components that interact with the user including the REST endpoints. This would track the interactions between the user and the application.

com.edq.standardisation

The api that interfaces to the standalone standardisation component.

The log levels in log4j follow the hierarchy in the table below. Therefore if you set the log level to DEBUG, you would get all the levels below DEBUG as well.

Level Description

ALL

All levels

TRACE

Designates finer-grained informational events than the DEBUG

DEBUG

Granular information, use this level to debug a package

INFO

Informational messages that highlight the progress of the application at coarse-grained level

WARN

Potentially harmful situations

ERROR

Error events that might still allow the application to continue running

FATAL

Severe error events that will presumably lead the application to abort

OFF

The highest possible rank and is intended to turn off logging

Logging outputs

Experian Match log

By default Experian Match is set to output the logs to CATALINA_HOME\logs\matching.log. This can be changed by editing the below section of the log4j2.xml file:

<RollingFile name="MatchingLog"
                     fileName="${LOG_DIR}/matching.log"
                     filePattern="${ARCHIVE}/matching.log.%d{yyyy-MM-dd}.gz">
            <PatternLayout pattern="${PATTERN}"/>
            <Policies>
                <TimeBasedTriggeringPolicy/>
                <SizeBasedTriggeringPolicy size="1 MB"/>
            </Policies>
            <DefaultRolloverStrategy max="2000"/>
</RollingFile>

Adjusting the fileName attribute allows you to change the name and location.

Other logs

In addition to the Experian Match log, there are the following logs generated in the same file system location:

MatchingMetricsLog

Captures metrics reporting within the application (described below), logged to 'matching-metrics.log'

MatchingScoringLog

Captures output from the Scoring processing comprising the audited candidate pairs, logged to 'matching-scoring.log'

These logs are configured in the same way as the Experian Match log.

Note that the MatchingMetricsLog will only operate if the Log Level for that Logger is set to 'DEBUG' and the Monitoring property enabled as described below.

Monitoring

Application monitoring is available providing metrics output from the application. Output is optionally to log file, csv file or Web based outputs. The metrics is an advanced configuration that may be used for evaluating performance, supporting fine tuning and for debugging of the application.

Monitoring configuration

This facility is configured in the system 'application.properties' file via a collection of parameters. These are:

Parameter Default value Description

matching.metrics.time.rate.seconds

5

Metrics measurement interval in seconds

matching.metrics.csv.active

false

Metrics in csv file output is only active if this is true

matching.metrics.csv.path

logs/metrics

Location for csv file format metrics files

matching.metrics.log.active

false

Metrics are written to log file (MatchingMetricsLog) if this is true and the Log Level is DEBUG

matching.metrics.web.active

false

When true then metrics are available as a REST endpoint

matching.metrics.endpoint

/monitoring/*

The REST endpoint to access Web-based metrics (will resolve to: http://localhost:{port}/monitoring?pretty=true)

Monitoring output

Typical output comprises:

  • Summary of tasks performed

  • Summary of system usage (memory and cpu)

  • Job status

  • Number of threads used per component

  • Individual metrics for each component of Matching, comprising the timings and rates of processing

For the system log file and Web page these are all reported together, for the csv based metrics each is written to its own individual file.

Session reporting

Blocking key reporting

A reporting operation is available that provides statistics about how blocking keys were used for a matching job. An example response is given below.

The response contains two sets of blocking key statistics for each key configured, and details of the block size threshold.

blockSizeThreshold is the maximum value for which records, with the same blocking key, will be considered as candidate matching pairs. Experian Match records a Threshold Exception when the number of records in a block exceeds this value. The default value of 200 can be overidden with the maxBlockSize configuration setting when creating a blocking key.

numThresholdExceptions is the number of exceptions recorded for each blocking key type.

Each set of statistics has the number of blocks generated and the minimum, maximum, mean, median, standard deviation of block size.

statisticsExcludingExceptions are calculated with only the blocks that were included for matching as they were within the threshold.

statisticsIncludingExceptions are calculated for all the blocks that could be generated including block sizes above threshold.

GET /v2/session/{sessionId}/blockingKeyStatistics
{
  "blockSizeThreshold": 200,
  "blockingKeyStatistics": [
    {
      "blockingKeyId": 1,
      "description": "ForenamesSurname",
      "numThresholdExceptions": 0,
      "statisticsExcludingExceptions": {
        "blockCount": 50,
        "maximum": 10,
        "mean": 1.3,
        "median": 1.2,
        "minimum": 1,
        "standardDeviation": 0.75
      },
      "statisticsIncludingExceptions": {
        "blockCount": 50,
        "maximum": 10,
        "mean": 1.3,
        "median": 1.2,
        "minimum": 1,
        "standardDeviation": 0.75
      }
    }
  ]
}

Matching job management

Match jobs allow you to perform workflow steps to build clusters for all records in the configured datasources, to output the clustered records, or to perform both steps with a single call.

Note: The system currently only supports running one job at a time.

  • Any other created jobs are added to a queue with a status of PENDING, and are run sequentially in order of job creation.

  • Just like the running job, pending jobs can be cancelled if no longer required.

  • This does NOT apply to maintenance operations; these will still be immediately rejected if another job is already running, and will not be queued.

Tutorials

Performing a match and output job

The following tutorial will work through a full match and output job.

Prerequisites

  • Experian Match REST API correctly installed and deployed. This includes the installation of the standardization service. Follow the installation guide if you have not completed this. If you encounter any errors whilst working through this tutorial, check that you have correctly installed and deployed the API.

  • Postman collection and environments imported (if using Postman)

Overview

For this tutorial we will be running the API under Tomcat using the default port - in this case port 8080.

For the input we will use the example CSV file provided, connect to a disk-backed HSQL database as the match store and output the results to a new CSV file.

Our input file contains mocked up name, address and email data from the United Kingdom. By performing a match and output job we will be able to see where our data matches, ultimately providing an output file demonstrating where similar records have been clustered together. This would allow us to assess, remove or combine our duplicate records.

You can see a 6 row sample below:

RECORDID NAME ADDRESS1 ADDRESS2 ADDRESS3 TOWN PROVINCE POSTCODE EMAIL DOB

123514

Mrs Lydia Bright

Old Brewery

2 The Maltings

Milton Abbas

BLANDFORD

DT11 0BH

lydia@rosskleinltd.co.uk

1985-04-17

123515

Ms Lydia Bright

Old Brewery

2 The Maltings

Milton Abbas

BLANDFORD

DT11 0BH

123516

Mrs Lydia Bright

Old Brewery

2 The Maltings

BLANDFORD

lydia@rosskleinltd.co.uk

1985-04-17

123517

Mrs L Bright

Old Brewery

2 The Maltings

Milton Abbas

BLANDFORD

DT110BH

lydia@rosskleinltd.co.uk

1985-04-05

123518

Mr James Ashworth

Manor Farm

WIGTON

CA7 9RS

james@ashworth.co.uk

1977-03-13

123519

Mr James Ashworth

Manor Farm

WIGTON

CUMBRIA

CA7 9RS

james@ashworth.co.uk

1977-03-13

As you can see there’s a lot of data that looks very similar. Once our job is complete, we should expect to see many of these clustered together. Whether or not these match will be down to how strict our ruleset is.

Making requests

We can make our requests using a few different methods, you can find more information in the making requests with the API section.

We have provided a Postman collection (FF-HSQL-FF) for this tutorial, you should be able to make each request in the collection sequentially to perform a match and output job. Please ensure that you have imported the Postman collections and environments correctly. The parameters provided when importing the environments will work, however the sample requests below demonstrate the explicit values.

The simplest way to explore the API and make your requests is with the Swagger interactive documentation hosted at http://localhost:8080/matching-rest-api-2.7.1/swagger-ui.html. You should be able to copy and paste the objects from the tutorial into the relevant requests, this will provide you with a better understanding of each request and the workflow.

1. Configuring your data source

The first part of configuring your job is to configure the data source. Our data source will be the provided CSV file - DEMO_100.csv.

For simplicity we will store our input and subsequent output files in C:\temp

To make a data source configuration request we need to make a POST request to /v2/configuration/datasource. As we are running Experian Match on Tomcat, with the webapp deployed at the ROOT, the full url for this request is http://localhost:8080/v2/configuration/datasource. When using the Postman collection we want to use the "Post connection settings for INPUT FLATFILE" request.

We need to post an object where we specify how we connect to our input, in this case our CSV file. We also need to map the input fields so that the API knows what type of data we have in our input file.

Here is what our request body will look like:

{
  "connection": {
    "connectionSettings": {
      "Path": "C:\\temp\\DEMO_100.csv",
      "Header": "true",
      "Quote": "\""
    },
    "connectionType": "FLATFILE"
  },
  "description": "A FLATFILE input connection",
  "fieldMappings": [
  	{
      "field": 1,
      "fieldType": "ID"
    },
    {
      "field": 2,
      "fieldType": "NAME"
    },
    {
      "field": 3,
      "fieldType": "ADDRESS"
    },
	{
      "field": 4,
      "fieldType": "ADDRESS"
    },
    {
      "field": 5,
      "fieldType": "ADDRESS"
    },
    {
      "field": 6,
      "fieldType": "LOCALITY"
    },
    {
      "field": 7,
      "fieldType": "PROVINCE"
    },
    {
      "field": 8,
      "fieldType": "POSTCODE"
    }
  ]
}

As you can see we have set our connectionType as FLATFILE and we have set the path to that of our CSV file.

We have also set our field mappings to match that of our input file. For this example we want to match on name and address and so we have mapped the relevant fields.

The response body we get from the API is:

{
  "datasourceId": 1,
  "description": "A FLATFILE input connection",
  "connection": {
    "connectionType": "FLATFILE",
    "connectionSettings": {
      "Path": "C:\\temp\\DEMO_100.csv",
      "Header": "true",
      "Quote": "\""
    }
  },
  "fieldMappings": [
    {
      "field": "1",
      "fieldType": "ID",
      "fieldGroup": null
    },
    {
      "field": "2",
      "fieldType": "NAME",
      "fieldGroup": null
    },
    {
      "field": "3",
      "fieldType": "ADDRESS",
      "fieldGroup": null
    },
    {
      "field": "4",
      "fieldType": "ADDRESS",
      "fieldGroup": null
    },
    {
      "field": "5",
      "fieldType": "ADDRESS",
      "fieldGroup": null
    },
    {
      "field": "6",
      "fieldType": "LOCALITY",
      "fieldGroup": null
    },
    {
      "field": "7",
      "fieldType": "PROVINCE",
      "fieldGroup": null
    },
    {
      "field": "8",
      "fieldType": "POSTCODE",
      "fieldGroup": null
    }
  ]
}

The response confirms our connection type and field mappings as well as providing us with a data source ID. Make a note of this ID, it is important and we will need it later on.

2. Configuring your output

Next we need to configure our output. As we are outputting to a flat file we will configure an output CSV.

We need to post another object to our API, this time to http://localhost:8080/v2/configuration/output. When using the Postman collection we want to use the "Post connection settings for OUTPUT FLATFILE" request.

This object needs to include connection settings for our output file as well as mapping our input fields to our desired output fields.

Our request body will look like the below:

{
  "connection": {
    "connectionSettings": {
      "Path": "C:\\temp\\DEMO_100_output.csv",
      "Header": "true"
    },
    "connectionType": "FLATFILE"
  },
  "overwriteExisting": true,
  "description": "A FLATFILE output connection",
  "outputMapping": [
  	{
      "inputField": [
        {
          "field": "$RECORD_ID",
          "source": 0
        }
      ],
      "outputField": "RecordID"
    },
    {
      "inputField": [
        {
          "field": 2,
          "source": 1
        }
      ],
      "outputField": "Name"
    },
    {
      "inputField": [
        {
          "field": 3,
          "source": 1
        }
      ],
      "outputField": "Address1"
    },
    {
      "inputField": [
        {
          "field": 4,
          "source": 1
        }
      ],
      "outputField": "Address2"
    },
    {
      "inputField": [
        {
          "field": 5,
          "source": 1
        }
      ],
      "outputField": "Address3"
    },
    {
      "inputField": [
        {
          "field": 6,
          "source": 1
        }
      ],
      "outputField": "Town"
    },
    {
      "inputField": [
        {
          "field": 7,
          "source": 1
        }
      ],
      "outputField": "County"
    },
    {
      "inputField": [
        {
          "field": 8,
          "source": 1
        }
      ],
      "outputField": "Postcode"
    },
    {
      "inputField": [
        {
          "field": 9,
          "source": 1
        }
      ],
      "outputField": "Email"
    },
    {
      "inputField": [
        {
          "field": 10,
          "source": 1
        }
      ],
      "outputField": "DateOfBirth"
    },
    {
      "inputField": [
        {
          "field": "$CLUSTER_ID",
          "source": 0
        }
      ],
      "outputField": "Cluster_ID"
    },
    {
      "inputField": [
        {
          "field": "$MATCH_STATUS",
          "source": 0
        }
      ],
      "outputField": "Match_Status"
    }
  ]
}

As you can see, some of our output fields actually differ from our data source fields. Experian Match allows us to output whichever fields we want from our input file, even if they haven’t been used for matching. For example, we are outputting the EMAIL and DOB fields, this is information we want within our output file but we don’t want the API to use it for matching on this occasion, hence omitting it from our input mapping.

We have also included a Cluster_ID, Match_Status and Record_ID output field. These fields take $CLUSTER_ID, $MATCH_STATUS and $RECORD_ID as input fields respectively. Fields that start with $ are calculated or system fields, as such, both IDs and the match status are returned by the API and not derived from the original data source. We therefore set the source as 0. For all other input fields we set the source as 1 to reflect the datasourceID returned to us when configuring our data source.

The response body we get from the API is:

{
  "outputId": 1,
  "description": "A FLATFILE output connection",
  "connection": {
    "connectionType": "FLATFILE",
    "connectionSettings": {
      "Path": "C:\\temp\\DEMO_100_output.csv",
      "Header": "true"
    }
  },
  "outputMapping": [
    {
      "inputField": [
        {
          "field": "$RECORD_ID",
          "source": 0
        }
      ],
      "outputField": "RecordID"
    },
    {
      "inputField": [
        {
          "field": 2,
          "source": 1
        }
      ],
      "outputField": "Name"
    },
    {
      "inputField": [
        {
          "field": 3,
          "source": 1
        }
      ],
      "outputField": "Address1"
    },
    {
      "inputField": [
        {
          "field": 4,
          "source": 1
        }
      ],
      "outputField": "Address2"
    },
    {
      "inputField": [
        {
          "field": 5,
          "source": 1
        }
      ],
      "outputField": "Address3"
    },
    {
      "inputField": [
        {
          "field": 6,
          "source": 1
        }
      ],
      "outputField": "Town"
    },
    {
      "inputField": [
        {
          "field": 7,
          "source": 1
        }
      ],
      "outputField": "County"
    },
    {
      "inputField": [
        {
          "field": 8,
          "source": 1
        }
      ],
      "outputField": "Postcode"
    },
    {
      "inputField": [
        {
          "field": 9,
          "source": 1
        }
      ],
      "outputField": "Email"
    },
    {
      "inputField": [
        {
          "field": 10,
          "source": 1
        }
      ],
      "outputField": "DateOfBirth"
    },
    {
      "inputField": [
        {
          "field": "$CLUSTER_ID",
          "source": 0
        }
      ],
      "outputField": "Cluster_ID"
    },
    {
      "inputField": [
        {
          "field": "$MATCH_STATUS",
          "source": 0
        }
      ],
      "outputField": "Match_Status"
    }
  ],
  "filter": "ALL",
  "overwriteExisting": true
}

Once again our response confirms our output connection settings and field mappings. We also get an output ID - as with the data source ID we will use this for configuring our session.

We should expect to see these fields in our output file:

RecordID Name Address1 Address2 Address3 Town County Postcode Email DateOfBirth Cluster_ID Match_Status

3. Configuring your rules

We now need to configure the rules we will be using to control the stringency of our matching job. We do this by making a post request to http://localhost:8080/v2/configuration/rule. When using the Postman collection we want to use the "Post rules" request.

We need to provide a JSON escaped string containing our rules. We have used the default ruleset for the United Kingdom, which you can find more information about in our rules section. We can also give our rules a description, this is useful if we plan to use different rulesets for subsequent jobs.

Our request object will look like the below:

{
  "description": "Default Matching Rules for the United Kingdom",
  "ruleVersion": "v1",
  "rules": "<As in file>"
}

The response body we will get from the API is:

{
  "ruleSetId": 1,
  "description": "Default Matching Rules for the United Kingdom",
  "rules": "<As in file>",
  "ruleVersion": "v1"
}

Our response will return the description, rules and version we posted along with a ruleSet ID which will be used when we configure our session.

4. Configuring the blocking keys

We will use the default United Kingdom blocking keys from the GET request to http://localhost:8080/v2/configuration/blockingKey/default/GBR

[
  {
    "description": "FullPostcode",
    "countryCode": "GBR",
    "maxBlockSize": 200,
    "elementSpecifications": [
      {
        "elementType": "POSTCODE",
        "includeFromNChars": 5,
        etc...

To use these blocking keys with our session we need to make a POST request to http://localhost:8080/v2/configuration/blockingKey including the keys. Copy the response from the GET request and POST this to create the keys. The newly generated keys will each have a blockingKeyId for use in the session configuration.

5. Configuring your session

Now that we have configured our data source, output, rules and blocking keys we can configure our session.

We need to post the IDs returned to us in our previous requests as well as providing connection settings for our SQL database to be used as the match store. We make the post request to http://localhost:8080/v2/session When using the Postman collection we want to use the "Post session" request.

For the blocking keys we want to use IDs 1 to 9 which we set earlier. As we haven’t configured any other data sources, outputs or rules, our IDs should all be 1. We can also make use of the HSQL functionality provided by the API. This allows us to specify a disk-backed store.

Our request body will look like the below:

{
  "datasourceIds": [
    1
  ],
  "description": "A FF-HSQL-FF matching session",
  "matchStoreConnection": {
    "connectionSettings": {
      "JdbcUrl": "jdbc:hsqldb:file:c:\\temp\\hsqldb",
      "JdbcDriver": "org.hsqldb.jdbc.JDBCDriver",
      "UserName": "sa",
      "Password": "",
      "Table": "DEMO_100_FF"
    },
    "connectionType": "JDBC"
  },
  "outputId": 1,
  "ruleSetId": 1,
  "blockingKeyIds": [
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9
  ]
}

As we are making use of HSQL we have set the connection type to JDBC and our JdbcUrl to a file in our c:\\temp folder. We also need to provide the right driver. The HSQL driver is packaged with the API.

The response body we get from the API is:

{
  "sessionId": 1,
  "description": "A FF-HSQL-FF matching session",
  "matchStoreConnection": {
    "connectionType": "JDBC",
    "connectionSettings": {
      "JdbcUrl": "*****",
      "JdbcDriver": "org.hsqldb.jdbc.JDBCDriver",
      "UserName": "*****",
      "Password": "*****",
      "Table": "DEMO_100_FF"
    }
  },
  "blockingKeyIds": [
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9
  ],
  "datasourceIds": [
    1
  ],
  "outputId": 1,
  "ruleSetId": 1
}

Our response will return us the same JSON as we posted along with a session ID. We will use this sessionID when performing a match job.

5. Run your match and output job

Our final step is to run the match and output job. We need to post the sessionID to http://localhost:8080/v2/matching/job/matchAndOutput. When using the Postman collection we want to use the "Match and Output" request.

We can also give our a description and specify a callback URI, we will leave the callback blank for this tutorial.

Our request body will look like the below:

{
  "description": "FF-SQL-FF match and output",
  "sessionId": 1
}

The initial response body we get from the API is:

{
  "jobId": 1,
  "description": "FF-SQL-FF match and output",
  "createTime":"2017-06-07T18:55:30.35",
  "startTime": "2017-06-07T18:55:34.87",
  "finishTime": null,
  "progress": 0,
  "message": null,
  "state": "PENDING"
}

The response shows:

  • the unique job ID

  • the job description

  • the time that the job was first created

  • the time that the job was started

  • the time that the job was finished

  • the job progress

  • a message detailing the state of the job

  • the state of the job (i.e. PENDING/RUNNING/CANCELLED/FAILED/FINISHED)

To check the status of our job we need to make a GET request to http://localhost:8080/v2/matching/job/1. The 1 refers to the jobID returned to us when we first scheduled the job. You could also make a GET request to http://localhost:8080/v2/matching/job to return the status of all jobs.

We will get the following response if our job was successful:

{
  "jobId": 1,
  "description": "FF-SQL-FF match and output",
  "createTime":"2017-06-07T18:55:30.35",
  "startTime": "2017-06-07T18:55:34.87",
  "finishTime": "2017-06-07T18:55:59.381",
  "progress": 100,
  "message": "Job Complete.",
  "state": "FINISHED"
}

We will now be able to navigate to our C:\temp folder and view our E2E_100_output.csv output file. If you sort your output file by Cluster_ID you will be able to see which records have been clustered together.

A 6 row sample of the output is shown below:

Name Address1 Address2 Address3 Town County Postcode Email DateOfBirth RecordID Cluster_ID Match_Status

Mrs Lydia Bright

Old Brewery

2 The Maltings

Milton Abbas

BLANDFORD

DT11 0BH

lydia@rosskleinltd.co.uk

1985-04-17

123514

17

CLOSE

Ms Lydia Bright

Old Brewery

2 The Maltings

Milton Abbas

BLANDFORD

DT11 0BH

123515

17

CLOSE

Mrs Lydia Bright

Old Brewery

2 The Maltings

BLANDFORD

lydia@rosskleinltd.co.uk

1985-04-17

123516

17

PROBABLE

Mrs L Bright

Old Brewery

2 The Maltings

Milton Abbas

BLANDFORD

DT110BH

lydia@rosskleinltd.co.uk

1985-04-05

123517

17

CLOSE

Mr James Ashworth

Manor Farm

WIGTON

CA7 9RS

james@ashworth.co.uk

1977-03-13

123518

3

EXACT

Mr James Ashworth

Manor Farm

WIGTON

CUMBRIA

CA7 9RS

james@ashworth.co.uk

1977-03-13

123519

3

EXACT

Experian Match has clustered a number of our records together, clearly showing that, as we suspected, we had duplicate data in our data source.

Match store searching

After a match job has been run, you can search the match store to find records which could potentially match against your target record. This is so that the Match Store can be checked for existing matches, and protect duplicates from being entered into the Match Store. Searching may also be a pre-condition for performing transactional maintenance.

Basic searching

Searches can be performed against a built Match Store with very little configuration. Using the example on the Overview page, a search can be made as such:

POST /v2/search/1
{
  "search": {
    "Forename": "Joe",
    "Surname": "Bloggs",
    "Address1": "10 Downing St",
    "Address2": "London"
  }
}

The session ID is passed as a parameter in the path. The contents of search is a schema-less JSON object. The keys correspond to the input mappings specified in the Data Source field mappings. Here, Address3 was not specified, and so it can be omitted, and will be treated as blank when Searching.

The response object will look like this:

{
  "results": [
  {
    "record": {
        "Cust_ID": "1",
        "FirstName": "Joseph",
        "LastName": "Bloggs",
        "Cluster": 1
    },
    "matchStatus": "EXACT"
  },
  {
    "record": {
        "Cust_ID": "2",
        "FirstName": "Joe",
        "LastName": "Bloggs",
        "Cluster": 1
    },
    "matchStatus": "PROBABLE"
  },
  {
    "record": {
        "Cust_ID": "3",
        "FirstName": "Jo",
        "LastName": "Blog",
        "Cluster": 2
    },
    "matchStatus": "PROBABLE"
  } // etc...
  ]
}

Without any additional configuration, the response will be a list of objects containing a record and it’s corresponding match status with respect to the search term. The record contains fields from the Output Mapping which could potentially match the provided search term. If no output mapping has been configured (i.e. because you have run a MATCH job), then the output will contain fields recordID, sourceID, and clusterID.

The Search result order is in descending order of matching status. So EXACT matches are returned first in the collection.

Potential matches are determined using the rule set configured in the Session.

If you wish to override the fields which are use for searching, or the rules use for matching, and output fields this will require additional configuration.

Advanced searching

To use different field mappings from your match job, you must create a new data source, and register it with a new session.

First, create a new data source configuration. Refer to Overview Section 1: Data source configuration. You can specify REST as a data source type, to prevent this configuration from being used in normal match or matchAndOutput jobs.

POST /v2/configuration/datasource
{
  "description": "A data source use only for Search",
  "connection": {
    "connectionType": "REST"
  }
  "fieldMappings": [
    // etc...
  ]
}

You also don’t need to supply any connectionSettings, as there are none.

For output mappings, follow a similar approach of using REST as a data source type.

Note
If you have specified a REST data source type, you cannot use this as an "inputField" source in the output mapping. Doing so will result in an error.

If you wish to use separate rules or separate blocking keys for matching, create these configurations in the standard way. See the tuning section for more information.

Next, you need to create a new Session to contain your overridden settings:

POST /v2/session
{
  "description": "Search session",

  // These settings should be the same as your existing Session.
  "matchStoreConnection": {
    "connectionSettings": {
      "JdbcUrl": "jdbc:sqlserver://DBSERVER;jdbcDatabaseName=CustomerDB",
      "JdbcDriver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
      "Table" : "Cust_Index",
      "UserName" : "matchingUser",
      "Password" : "Password123"
    },
    "connectionType": "JDBC"
  },
  "datasourceIds": [
    1, // The existing data source ID
    2  // Your new data source ID (REST)
  ],
  "ruleSetId": 2, // Your new rule set ID
  "outputId": 2, // Your new output mapping (REST)
  "blockingKeyIds": [
      1,
      2
    ] // Your blocking key IDs
}

Your search can then be performed using this new Session, which will have the new field mappings for the search terms.

Find by Cluster ID or Record ID.

After a match job has run, you may wish to find clusters from a single Record ID or a Cluster ID.

When searching using a Cluster ID, the result will be all records in the specified cluster. The path parameters are the Session ID and the Cluster ID being searched for.

GET /v2/search/1/23

The response object will look like this:

{
  "results": [{
    "Cust_ID": "1",
    "FirstName": "Joseph",
    "LastName": "Bloggs",
    "Cluster": 1
  } // etc...
  ]
}

When searching using a Record ID, the result will be all records in the same cluster, including the specified record ID. The path parameters are the Session ID, and the Source ID and Record ID being searched for.

GET /v2/search/1/1/123

The response object will look like this:

{
  "results": [{
    "Cust_ID": "1",
    "FirstName": "Joseph",
    "LastName": "Bloggs",
    "Cluster": 1
  } // etc...
  ]
}

Transactional maintenance

After a match job has been run, you can maintain data in the match store to bring it into line with changes in the source data. Transactional maintenance functionality is presented via the REST interface, allowing Records to be added to, updated in or deleted from the Match Store. The Match Store search functions may be used to identify Records requiring maintenance.

In performing a maintenance operation, the request will apply all relevant Matching pipeline tasks to the record. This includes Standardisation, Keying and Blocking where the record is being added or updated. Finally, the record is re-scored and clustered, potentially leading to changes to any other Records clustered with or previously clustered with that Record.

Caveats

  • In order to perform maintenance on the Match Store there should be no jobs already running

  • The maintenance function will lock the Match Store for the duration of the request

  • Actions against the Match Store are performed as atomic transactions, rolling back in the event of any error

  • The data source pointed to by the datasource id must not be a flat file source

Available functions

addMatchStoreRecord

This function captures the addition of a new Record to a data source.

An example request:

POST /v2/matching/addMatchStoreRecord
{
	"sessionId": "1",
	"datasourceId": "1",
	"recordId": "123458"
}

Where:

  • sessionId: the configured session

  • datasourceId: the data source containing the Record being added

  • recordId: the new unique Id

This operation involves Standardisation, Keying and Blocking of the new Record followed by re-scoring and clustering with all Records related by scoring.

updateMatchStoreRecord

This function captures the case of some change to an existing Record in a data source. Typically this would represent a change to some attribute(s) on the Record that may affect the match characteristics of the Record.

An example request:

POST /v2/matching/updateMatchStoreRecord
{
	"sessionId": "1",
	"datasourceId": "1",
	"recordId": "123458"
}

Where:

  • sessionId: the configured session

  • datasourceId: the data source containing the Record being updated

  • recordId: the existing unique Id

This operation involves Standardisation, Keying and Blocking of the updated Record followed by re-scoring and clustering with all Records related by scoring.

deleteMatchStoreRecord

This function captures the case of an existing Record being removed from a data source.

An example request:

POST /v2/matching/deleteMatchStoreRecord
{
	"sessionId": "1",
	"datasourceId": "1",
	"recordId": "123458"
}

Where:

  • sessionId: the configured session

  • datasourceId: the data source containing the Record being removed

  • recordId: the existing unique Id

This operation involves re-clustering of all Records related to the Record being deleted.

Results

A result JSON object is produced upon successful completion of the maintenance request. This contains the identifier of the affected Record in the Match Store along with the clustering outcome (see below), the Cluster Id to which the Record has been allocated and a collection containing the full set of Record ids in the same Cluster. In addition, a changes collection contains the Clusters and their member identifiers that have been affected by the operation.

Clustering outcome

When a maintenance operation is performed, the final stage is to regenerate the Clusters for all Records affected by the operation, i.e. scored together or previously clustered with the Record in question. The outcome of re-clustering may be one of several cases.

  • ADD_NEW: a new Cluster has been generated for this Record (i.e. an Add of a Record with no other matches)

  • ADD_EXISTING: this Record has been added to an existing Cluster (i.e. an Add of a Record along with its matches)

  • MERGE: the transaction has led to two Clusters merging (e.g. the Record forms a bridge between the two)

  • SPLIT: the transaction has led to a Cluster splitting (e.g. the Record previously formed a bridge between the two)

  • DELETE_RECORD: this Record has been removed from an existing Cluster but other Records remain in that Cluster

  • DELETE_CLUSTER: this Record has been removed from an existing Cluster and no other Records remain in that Cluster

  • COMPLEX: some combination of more than one of the above cases occurred in the same transaction

  • NO_CHANGE: no changes occurred to the clustering arrangements

Example

The below result shows the outcome for an update where the change to the Record has caused an existing Cluster to split into two. The changes collection shows all affected Records with their previous and new Cluster ids.

In detail, the original state has a Cluster comprising a number of records. When the new Record was presented, it caused two of the original Cluster’s Records to be split out of that Cluster into a new one, along with the new Record. In the changes collection we can see these two Records recording the original Cluster Id and the new one as well as the new Record, whose original Cluster Id is null. Records remaining in the original Cluster are not shown as they have not been changed.

{
    "results": [
        {
            "outcome": "SPLIT",
            "recordId": {
                "recId": "123470",
                "source": 1
            },
            "clusterId": 4,
            "recordsInCluster": [
                {
                    "recId": "123469",
                    "source": 1
                },
                {
                    "recId": "123471",
                    "source": 1
                },
                {
                    "recId": "123470",
                    "source": 1
                }
            ],
            "changes": [
                {
                    "recordId": {
                        "recId": "123471",
                        "source": 1
                    },
                    "prevClusterId": 1,
                    "clusterId": 4
                },
                {
                    "recordId": {
                        "recId": "123470",
                        "source": 1
                    },
                    "prevClusterId": null,
                    "clusterId": 4
                },
                {
                    "recordId": {
                        "recId": "123469",
                        "source": 1
                    },
                    "prevClusterId": 1,
                    "clusterId": 4
                }
            ],
						"oldClusters": [
							 {
									 "clusterId": 1,
									 "recordsInCluster": [
											 {
													 "recId": "123469",
													 "source": 1
											 },
											 {
													 "recId": "123471",
													 "source": 1
											 }
									 ]
							 }
					 ],
					 "newClusters": [
							 {
									 "clusterId": 4,
									 "recordsInCluster": [
											 {
													 "recId": "123469",
													 "source": 1
											 },
											 {
													 "recId": "123470",
													 "source": 1
											 },
											 {
													 "recId": "123471",
													 "source": 1
											 }
									 ]
							 }
					 ]
        }
    ]
}

Errors

A transactional request will be rejected in any of the following circumstances:

  • The specified session is not configured

  • The specified data source is not configured or cannot be accessed

  • The Match Store is unavailable

  • Another job is already in progress

  • The request is to Add but the Record Id already exists for that source in the Match Store

  • The request is to Add or Update but the Record Id does not exist at the specified source

  • The request is to Update or Delete but the Record Id does not exist for that source in the Match Store

  • The request is to Add or Update but the Record Id exists more than once at the specified source

Note that for a Delete no check is made that the Record Id has been deleted from the data source: this is the responsibility of the client workflow.

API Reference

Base URL

As an example, a deployment on the default Tomcat port would have the base URL http://localhost:8080/matching-rest-api-2.7.1/

Making requests

If you would like to make requests to the API without integrating you can use one of the following methods:

Using the Swagger-UI interactive documentation

Using the Postman collection

  • Before using Postman to make requests, you will need to import the Matching collections. You can do this by clicking import in the top left and navigating to the postman collection JSON files provided in the package. These can be found under Integration Samples\Postman\collections.

  • It is recommended that you use the Environments feature of Postman to parameterise values such as the hostname so you can use the same scripts against dev and production instances. You can import pre-configured environments and parameters from the package under Integration Samples\Postman\environments. To import environments, click the cog in the top right and select Manage Environments.

  • The Postman collection contains example requests for setting up input, index, and output configurations, and then running a full matching job.

  • The collection is set up to use the provided sample data which should be copied to C:\temp\.

  • The collection contains a set of best practice rules for the United Kingdom.

Using the Swagger-CodeGen program

  • Generate a client library in multiple languages using the code generator, this can be downloaded from https://github.com/swagger-api/swagger-codegen.

  • Follow the documentation and it will generate a client library for your language of choice.

  • You can generate a C# client by starting the Matching service and running the following command:
    java -jar swagger-codegen-cli-2.2.1.jar generate -i http://localhost:{port}/matching-rest-api-2.7.1/v2/api-docs -l csharp -o MatchingClient

    • It’s possible to customise the code generated by this tool. For example, here’s how you could generate a C# client for .NET v3.5: java -jar swagger-codegen-cli-2.2.1.jar generate -i http://localhost:{port}/matching-rest-api-2.7.1/v2/api-docs -l csharp -o MatchingClient -DtargetFramework="v3.5"

    • For more information about customising the generator, see here

Resources

Configuration

Configuration for input data source, output sink and rule set.

Get all output configurations.
GET /v2/configuration/output
Responses
HTTP Code Description Schema

200

OK

OutputConfigModel array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Create a new configuration for connecting to an output sink.
POST /v2/configuration/output
Parameters
Type Name Description Required Schema Default

BodyParameter

configuration

A description of where to store output data.

true

OutputConfigModel

Responses
HTTP Code Description Schema

200

OK

OutputConfigModel

201

Configuration created successfully.

OutputConfigModel

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get the best practice blocking keys by Country ISO 3166-1 alpha-3.
GET /v2/configuration/blockingKey/default/{countryISO3}
Description

Returns the best practice blocking keys per country.

Parameters
Type Name Description Required Schema Default

PathParameter

countryISO3

Country of configuration

true

string

Responses
HTTP Code Description Schema

200

OK

BlockingKeyConfigModel array

404

No Blocking keys found for specified country.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get all blocking keys
GET /v2/configuration/blockingKey
Description

Returns all the blocking keys.

Responses
HTTP Code Description Schema

200

OK

BlockingKeyConfigModel array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Creates one or more blocking key configurations.
POST /v2/configuration/blockingKey
Description

Creates blocking key configurations given an array of specifications.

Parameters
Type Name Description Required Schema Default

BodyParameter

configurations

Blocking key configuration array.

true

BlockingKeyConfigModel array

Responses
HTTP Code Description Schema

200

OK

BlockingKeyConfigModel array

201

Configuration created successfully.

BlockingKeyConfigModel array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Delete a configuration by ID
DELETE /v2/configuration/output/{outputId}
Parameters
Type Name Description Required Schema Default

PathParameter

outputId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

OK

HttpEntity

204

Successfully deleted

HttpEntity

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Update an output connection configuration by ID.
PUT /v2/configuration/output/{outputId}
Parameters
Type Name Description Required Schema Default

PathParameter

outputId

ID of configuration

true

integer (int32)

BodyParameter

configuration

A description of where and how to store output data.

true

OutputConfigModel

Responses
HTTP Code Description Schema

200

Success

OutputConfigModel

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get an output connection configuration by ID.
GET /v2/configuration/output/{outputId}
Parameters
Type Name Description Required Schema Default

PathParameter

outputId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

Success

OutputConfigModel

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Deletes a blocking key configuration by ID
DELETE /v2/configuration/blockingKey/{blockingKeyId}
Description

Deletes a specified blocking key configuration.

Parameters
Type Name Description Required Schema Default

PathParameter

blockingKeyId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

OK

HttpEntity

204

Successfully deleted

HttpEntity

404

BlockingKey with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get a blocking key configuration by ID.
GET /v2/configuration/blockingKey/{blockingKeyId}
Description

Returns the specified blocking key configuration.

Parameters
Type Name Description Required Schema Default

PathParameter

blockingKeyId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

OK

BlockingKeyConfigModel

404

BlockingKey with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Delete a rule by ID
DELETE /v2/configuration/rule/{ruleId}
Parameters
Type Name Description Required Schema Default

PathParameter

ruleId

ID of rule.

true

integer (int32)

Responses
HTTP Code Description Schema

200

OK

HttpEntity

204

Successfully deleted

HttpEntity

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Update a rule by ID.
PUT /v2/configuration/rule/{ruleId}
Parameters
Type Name Description Required Schema Default

PathParameter

ruleId

ID of configuration

true

integer (int32)

BodyParameter

configuration

A description of matching rules.

true

RuleConfigModel

Responses
HTTP Code Description Schema

200

Success

RuleConfigModel

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get a rule by ID
GET /v2/configuration/rule/{ruleId}
Parameters
Type Name Description Required Schema Default

PathParameter

ruleId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

Success

RuleConfigModel

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get all rule configurations
GET /v2/configuration/rule
Responses
HTTP Code Description Schema

200

OK

RuleConfigModel array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Create a new configuration that defines matching rules.
POST /v2/configuration/rule
Parameters
Type Name Description Required Schema Default

BodyParameter

configuration

A description of matching rules.

true

RuleConfigModel

Responses
HTTP Code Description Schema

200

OK

RuleConfigModel

201

Configuration created successfully.

RuleConfigModel

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get the best practice blocking keys.
GET /v2/configuration/blockingKey/default
Description

Returns the best practice blocking keys.

Responses
HTTP Code Description Schema

200

OK

BlockingKeyConfigModel array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get all data source configurations.
GET /v2/configuration/datasource
Responses
HTTP Code Description Schema

200

OK

DataSourceConfigModel array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Create a new configuration to connect to a data source.
POST /v2/configuration/datasource
Parameters
Type Name Description Required Schema Default

BodyParameter

configuration

A description of how to connect to the data source, what the data represents, and how it should be used.

true

DataSourceConfigModel

Responses
HTTP Code Description Schema

200

OK

DataSourceConfigModel

201

Configuration created successfully.

DataSourceConfigModel

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Delete a configuration by ID.
DELETE /v2/configuration/datasource/{datasourceId}
Parameters
Type Name Description Required Schema Default

PathParameter

datasourceId

ID of configuration.

true

integer (int32)

Responses
HTTP Code Description Schema

200

OK

HttpEntity

204

Successfully deleted.

HttpEntity

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Update a data source configuration by ID.
PUT /v2/configuration/datasource/{datasourceId}
Description

Sets a data source configuration using the supplied object. The updated values will be returned, however sensitive connection details such as a username or password will be masked.

Parameters
Type Name Description Required Schema Default

PathParameter

datasourceId

ID of configuration.

true

integer (int32)

BodyParameter

configuration

A description of how to connect to the data source, what the data represents, and how it should be used.

true

DataSourceConfigModel

Responses
HTTP Code Description Schema

200

Success

DataSourceConfigModel

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get a data source configuration by ID.
GET /v2/configuration/datasource/{datasourceId}
Description

Returns a data source configuration object. Sensitive connection details such as a username or password will be masked.

Parameters
Type Name Description Required Schema Default

PathParameter

datasourceId

ID of configuration.

true

integer (int32)

Responses
HTTP Code Description Schema

200

Success

DataSourceConfigModel

404

Configuration with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Matching

Run a full matching job or individual stages.

Get all matching jobs
GET /v2/matching/job
Responses
HTTP Code Description Schema

200

OK

JobStatus array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Schedule a job which will find matching records and write them to the configured data output sink.
POST /v2/matching/job/matchAndOutput
Parameters
Type Name Description Required Schema Default

BodyParameter

request

The job session and configuration to use.

false

MatchJobRequest

Responses
HTTP Code Description Schema

200

OK

JobStatus

202

Success

JobStatus

400

Invalid MatchingRequest

string

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Cancel a running or pending job.
POST /v2/matching/job/{jobId}/cancel
Parameters
Type Name Description Required Schema Default

PathParameter

jobId

ID of job

true

integer (int32)

Responses
HTTP Code Description Schema

200

Success

JobStatus

404

Job with supplied ID doesn’t exist.

JobStatus

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Output matching records.
POST /v2/matching/job/outputMatches
Parameters
Type Name Description Required Schema Default

BodyParameter

request

The job session and configuration to use.

false

MatchJobRequest

Responses
HTTP Code Description Schema

200

OK

JobStatus

202

Success

JobStatus

400

Invalid MatchingRequest

string

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Schedule a job to group records from the configured data sources into clusters.
POST /v2/matching/job/match
Description

Schedule a job to read all records from the data sources configured in the session and put the records into clusters.

Parameters
Type Name Description Required Schema Default

BodyParameter

request

The job session and configuration to use.

false

MatchJobRequest

Responses
HTTP Code Description Schema

200

OK

JobStatus

202

Success

JobStatus

400

Invalid MatchingRequest

string

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get the status of a job
GET /v2/matching/job/{jobId}
Parameters
Type Name Description Required Schema Default

PathParameter

jobId

ID of job

true

integer (int32)

Responses
HTTP Code Description Schema

200

Success

JobStatus

404

Job with supplied ID doesn’t exist.

JobStatus

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Search records in the system.

Find the source and record IDs of records in the specified cluster.
GET /v2/search/{sessionId}/{clusterId}
Parameters
Type Name Description Required Schema Default

PathParameter

sessionId

sessionId

true

integer (int32)

PathParameter

clusterId

clusterId

true

integer (int32)

Responses
HTTP Code Description Schema

200

Record(s) found.

SearchResponseModel

404

Cluster ID does not exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Find the cluster of records matching the search object.
GET /v2/search/{sessionId}/{sourceId}/{recordId}
Parameters
Type Name Description Required Schema Default

PathParameter

sessionId

sessionId

true

integer (int32)

PathParameter

recordId

recordId

true

string

PathParameter

sourceId

sourceId

true

integer (int32)

Responses
HTTP Code Description Schema

200

Record(s) found.

SearchResponseModel

404

Record ID does not exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Find the cluster of records matching the search object.
POST /v2/search/{sessionId}
Parameters
Type Name Description Required Schema Default

PathParameter

sessionId

sessionId

true

integer (int32)

BodyParameter

request

request

true

SearchModel

Responses
HTTP Code Description Schema

200

Record(s) found.

RealTimeSearchResponseModel

404

No matches found.

RealTimeSearchResponseModel

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Session

Configure a session to use with matching operations.

Delete a configuration by ID
DELETE /v2/session/{sessionId}
Parameters
Type Name Description Required Schema Default

PathParameter

sessionId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

OK

HttpEntity

204

Successfully deleted

HttpEntity

404

Session with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Update an session configuration by ID.
PUT /v2/session/{sessionId}
Description

Updates a session with the supplied settings.

Parameters
Type Name Description Required Schema Default

PathParameter

sessionId

ID of configuration

true

integer (int32)

BodyParameter

configuration

A description of the session including data sources, output settings and match rules.

true

SessionConfigModel

Responses
HTTP Code Description Schema

200

Success

SessionConfigModel

400

Unable to update session.

string

404

Session with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get a session configuration by ID.
GET /v2/session/{sessionId}
Description

Returns the specified session configuration.

Parameters
Type Name Description Required Schema Default

PathParameter

sessionId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

Success

SessionConfigModel

404

Session with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get all session configurations
GET /v2/session
Responses
HTTP Code Description Schema

200

OK

SessionConfigModel array

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Create a new session.
POST /v2/session
Parameters
Type Name Description Required Schema Default

BodyParameter

configuration

A description of the session including data sources, output settings and match rules.

true

SessionConfigModel

Responses
HTTP Code Description Schema

200

OK

SessionConfigModel

201

Configuration created successfully.

SessionConfigModel

400

Unable to create session.

string

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Get information about blocking keys from a given session
GET /v2/session/{sessionId}/blockingKeyStatistics
Parameters
Type Name Description Required Schema Default

PathParameter

sessionId

ID of configuration

true

integer (int32)

Responses
HTTP Code Description Schema

200

Success

BlockingKeyStatistics

404

Session with supplied ID doesn’t exist.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

System

Retrieve status and license information.

Get the system status.
GET /v2/system/status
Responses
HTTP Code Description Schema

200

Success

SystemStatus

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Apply an update code.
POST /v2/system/applyUpdateCode
Parameters
Type Name Description Required Schema Default

QueryParameter

updateCode

updateCode

false

string

Responses
HTTP Code Description Schema

200

Feature enabled successfully.

LicenseStatus

400

Unable to apply update code. See response for details.

LicenseStatus

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Transaction

Perform transactional maintenance of records in the system.

Delete a record from the match store.
POST /v2/transaction/delete
Parameters
Type Name Description Required Schema Default

BodyParameter

request

The session and record information for the match.

false

MatchRequest

Responses
HTTP Code Description Schema

200

Match record deleted.

TransactionalResultModel

400

Invalid MatchStoreRequest

No Content

404

Record ID does not exist.

No Content

409

Another job is already running, please try again once it has completed.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Update an existing record in the match store.
POST /v2/transaction/update
Parameters
Type Name Description Required Schema Default

BodyParameter

request

The session and record information for the match.

false

MatchRequest

Responses
HTTP Code Description Schema

200

Match record updated

TransactionalResultModel

400

Invalid MatchStoreRequest

No Content

404

Record ID does not exist.

No Content

409

Another job is already running, please try again once it has completed.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Add a new record to the match store.
POST /v2/transaction/add
Parameters
Type Name Description Required Schema Default

BodyParameter

request

The session and record information for the match.

false

MatchRequest

Responses
HTTP Code Description Schema

200

Match record added.

TransactionalResultModel

400

Invalid MatchStoreRequest

No Content

404

Record ID does not exist.

No Content

409

Another job is already running, please try again once it has completed.

No Content

Consumes
  • application/json

Produces
  • application/json;charset=UTF-8

Definitions

BlockSizeStatistics

Name Description Required Schema Default

blockCount

Number of blocks.

false

integer (int32)

maximum

Maximum block size.

false

integer (int32)

mean

Mean block size.

false

number (double)

median

Median block size.

false

number (double)

minimum

Minimum block size.

false

integer (int32)

standardDeviation

Standard deviation of block size.

false

number (double)

BlockingKeyConfigModel

Name Description Required Schema Default

blockingKeyId

Integer ID to refer to the key.

false

integer (int32)

countryCode

ISO3 code of country blocking key is to be used with.

false

string

description

Descriptive name for blocking key.

false

string

elementSpecifications

Array of element specifications. Blocking keys using the list of keys in order.
For example the key "FORENAMES+SURNAME+MINORSTREET_NUMBER" would be created from the key specification:
"elementSpecifications":[ { "elementType":"FORENAMES" },
{ "elementType":"SURNAME" },
{ "elementType":"MINORSTREET_NUMBER" }]

true

BlockingKeyElementSpecificationModel array

maxBlockSize

The maximum value for which records with the same blocking key will be considered as candidate matching pairs. The default value is 200.

false

integer (int32)

BlockingKeyElementAlgorithm

Name Description Required Schema Default

name

KeyType algorithm name.

false

enum (NO_CHANGE, DOUBLE_METAPHONE, DOUBLE_METAPHONE_FIRST_WORD, NYSIIS, SIMPLIFIED_STRING, SOUNDEX, CONSONANT, INITIAL, START_SUBSTRING, MIDDLE_SUBSTRING, END_SUBSTRING)

properties

Optional properties for algorithm.

false

object

BlockingKeyElementSpecificationModel

Name Description Required Schema Default

algorithm

Keying algorithm used to key the element.

false

BlockingKeyElementAlgorithm

elementGroup

The group the element belongs to in the data source configuration.

false

string

elementModifiers

Array of element modifiers to use if available. The list is processed in order and the first populated normalised form for an element will be used. For example given: "elementModifierList":["STANDARDSPELLING", "DERIVED"] the standard spelling form will be used if found. Then the derived form. In the default case, or if none of the modified versions are found, the unprocessed element string’s key is used.

false

enum (NONE, REMOVENOISE, STANDARDSPELLING, STANDARDABBREVIATION, ROOTNAME, STANDARDFORMAT, DERIVED) array

elementType

Type of element to key and use at this point in the blocking key.

true

enum (ID, SOURCE_ID, CLUSTER_ID, UNCLASSIFIED, COMPANY, POSITION, DIVISION, ORGANISATION, NAME, TITLE, FORENAMES, SURNAME_PREFIX, SURNAME, SURNAME_SUFFIX, HONORIFICS, GENDER, ADDRESS, PREMISE_AND_STREET, BUILDING_NUMBER, BUILDING_DESCRIPTION, BUILDING_TYPE, SUBBUILDING_NUMBER, SUBBUILDING_DESCRIPTION, SUBBUILDING_TYPE, MINORSTREET_NUMBER, MINORSTREET_PREDIRECTIONAL, MINORSTREET_DESCRIPTION, MINORSTREET_TYPE, MINORSTREET_POSTDIRECTIONAL, MAJORSTREET_NUMBER, MAJORSTREET_PREDIRECTIONAL, MAJORSTREET_DESCRIPTION, MAJORSTREET_TYPE, MAJORSTREET_POSTDIRECTIONAL, POBOX_NUMBER, POBOX_DESCRIPTION, DOUBLEDEPENDENTLOCALITY, DEPENDENTLOCALITY, LOCALITY, PROVINCE, POSTCODE, COUNTRY, COUNTRY_ISO3, PHONE, EMAIL, EMAIL_LOCAL, EMAIL_DOMAIN, GENERIC_STRING, DATE)

includeFromNChars

Only include keyed element in a blocking key if it is N or more characters in length Note: If specified, no blocking key is created when one or more of the configured elements do not meet this criteria.

false

integer (int32)

truncateToNChars

Truncate the keyed element to N characters in length.

false

integer (int32)

BlockingKeyStatistic

Name Description Required Schema Default

blockingKeyId

ID of the blocking key.

false

integer (int32)

description

Description of blocking key.

false

string

maxBlockSize

Maximum block size.

false

integer (int32)

numThresholdExceptions

Number of blocks whose size was greater than the threshold.

false

integer (int32)

statisticsExcludingExceptions

Blocking key statistics excluding exceptions.

false

BlockSizeStatistics

statisticsIncludingExceptions

Blocking key statistics including exceptions.

false

BlockSizeStatistics

BlockingKeyStatistics

Name Description Required Schema Default

blockingKeyStatistics

Statistics for the blocking key.

false

BlockingKeyStatistic array

ClusterChange

Name Description Required Schema Default

clusterId

false

integer (int32)

prevClusterId

false

integer (int32)

recordId

false

RecordId

ClusterModel

Name Description Required Schema Default

clusterId

ID of the cluster.

false

integer (int32)

recordsInCluster

Information about the particular record in the cluster.

false

RecordId array

ConnectionConfigModel

Name Description Required Schema Default

connectionSettings

The required settings to connect to your specific datasource.

true

object

connectionType

The type of datasource to connect to.

true

enum (FLATFILE, JDBC, MONGO, JMS, REST)

DataSourceConfigModel

Name Description Required Schema Default

connection

The type of datasource to connect to and the required settings.

true

ConnectionConfigModel

datasourceId

Datasource ID returned by the API. This ID is returned by the API and not included in the request.

false

integer (int32)

description

Description of the datasource.

false

string

fieldMappings

The field mappings for your datasource. This includes field name and type.

true

FieldMappingModel array

FieldMappingModel

Name Description Required Schema Default

field

The field to be used for mapping. For flat files this will be column number, starting from 1. For databases this will be the column name.

true

string

fieldGroup

The field group name. Can be used to group related fields together.

false

string

fieldType

The type of the field.

true

enum (ID, NAME, TITLE, FORENAMES, SURNAME, ADDRESS, PREMISE_AND_STREET, LOCALITY, PROVINCE, POSTCODE, COUNTRY, GENERIC_STRING, DATE, PHONE, EMAIL)

HttpEntity

Name Description Required Schema Default

body

false

object

JobStatus

Name Description Required Schema Default

createTime

false

string (date-time)

description

false

string

finishTime

false

string (date-time)

jobId

false

integer (int32)

message

false

string

progress

false

number (float)

startTime

false

string (date-time)

state

false

enum (PENDING, RUNNING, FINISHED, FAILED, CANCELLED)

LicenseStatus

Name Description Required Schema Default

dllLocation

false

string

isLicensed

false

boolean

licenseExpiry

false

string (date)

licenseFolder

false

string

messages

false

string

updateKey

false

string

MatchJobRequest

Name Description Required Schema Default

callbackUri

URI for a callback function.

false

string

description

Match job description.

false

string

sessionId

ID of the session to use for the match job.

true

integer (int32)

MatchRequest

Name Description Required Schema Default

datasourceId

The ID of the datasource.

true

integer (int32)

recordId

The ID of the record to add/update/delete.

true

string

sessionId

The ID of the session.

true

integer (int32)

OutputConfigModel

Name Description Required Schema Default

connection

The type of data output and the settings required to connect to it.

true

ConnectionConfigModel

description

Description of the output configuration.

false

string

outputId

Output configuration ID returned by the API. This ID is returned by the API and not included in the request.

false

integer (int32)

outputMapping

The output field mappings. Includes field names and types.

true

OutputFieldMappingModel array

overwriteExisting

Whether to overwrite existing records. Files will be overwritten. This setting is applicable to flat files only. For all other datasource types, datasource outputs will be appended to when writing any data.

false

boolean

OutputFieldMappingModel

Name Description Required Schema Default

inputField

The input field. This will be the field name (column number for flat files) and the ID of the datasource. Field name can also be a system field such as $RECORD_ID or $CLUSTER_ID.

true

SourceFieldSelector array

outputField

The field name or column number to output to.

true

string

RealTimeRecordResponse

Name Description Required Schema Default

matchStatus

The status of the match.

false

enum (EXACT, CLOSE, PROBABLE, POSSIBLE, NONE)

record

The record returned.

false

object

RealTimeSearchResponseModel

Name Description Required Schema Default

results

The results of a real-time search request.

false

RealTimeRecordResponse array

RecordId

Name Description Required Schema Default

recId

false

string

source

false

integer (int32)

RuleConfigModel

Name Description Required Schema Default

description

Description of the ruleset.

false

string

ruleSetId

Ruleset ID returned by the API.

false

integer (int32)

ruleVersion

Rule version.

false

enum (v1)

rules

A JSON-escaped string containing the rules.

false

string

SearchModel

Name Description Required Schema Default

search

Schema-less JSON object containing search criteria. The keys correspond to datasource input mappings.

true

object

SearchResponseModel

Name Description Required Schema Default

results

Returns an array of objects containing the records that match the search input. Each record object will contain the record fields from your output mapping along with a match status.

false

MapOfstringAndstring array

SessionConfigModel

Name Description Required Schema Default

blockingKeyIds

The IDs of the blocking key configuration to use with the session.

true

integer (int32) array

datasourceIds

The IDs of the datasource configuration to use with the session.

true

integer (int32) array

description

Description of the session.

false

string

matchStoreConnection

The match store connection type and settings.

true

ConnectionConfigModel

outputId

The ID of the output configuration to use with the session.

true

integer (int32)

ruleSetId

The ID of the rule set to use with the session.

true

integer (int32)

sessionId

Session ID returned by the API. This ID is returned by the API and not included in the request.

false

integer (int32)

SourceFieldSelector

Name Description Required Schema Default

field

The field name (column number for flat files).

false

string

source

The source ID. Set this to 0 if it is a system field such as $RECORD_ID or $CLUSTER_ID.

false

integer (int32)

SystemStatus

Name Description Required Schema Default

licenseStatus

false

LicenseStatus

warnings

false

string

TransactionResultResponse

Name Description Required Schema Default

changes

The changes made to the clusters.

false

ClusterChange array

clusterId

ID of the cluster.

false

integer (int32)

newClusters

Information about the new clusters.

false

ClusterModel array

oldClusters

Information about the old clusters.

false

ClusterModel array

outcome

The outcome of the transactional request.

false

enum (ADD_NEW, ADD_EXISTING, MERGE, SPLIT, DELETE_RECORD, DELETE_CLUSTER, COMPLEX, NO_CHANGE)

recordId

ID of the record.

false

RecordId

recordsInCluster

Information about the records in the cluster.

false

RecordId array

TransactionalResultModel

Name Description Required Schema Default

results

Results from a transactional request.

false

TransactionResultResponse array