Published January 21, 2021
Data warehouse vs master data management
Aggregating a lot of data into a centralised place, cleaning and joining data from different sources when building a data warehouse looks very similar to what we do with an MDM. So what’s the difference, and when might one be enough?
Frequently when building a data warehouse, you’ll notice that you’re starting to aggregate a lot of data into a centralised place and cleaning/joining data from different sources. At a glance, this looks very similar to what we do with an MDM. So what’s the difference, and when might one be enough?
What is a data warehouse?
A data warehouse is a central system providing a place to store, model, and analyse data from across a business. Data warehouses are generally MPP databases which provide a SQL interface. Data will be loaded into the data warehouse through a data integration tool and then modelled by analysts to produce cleansed data. This data is then used to provide reporting for the organisation, such as total sales volume per region.
Examples include Snowflake, Azure Synapse, BigQuery, and Amazon Redshift.
What is an MDM?
In short, a Master Data Management solution is a central system providing a master version of important company records. For example, a bank might use an MDM to see a single customer record that spans their personal banking system, their business banking system, their fraud detection software, and even their email marketing system.
Examples include CluedIn, Microsoft SQL Server Master Data Services, and Informatica MDM.
How they’re similar
- Both require integrating data from multiple sources across the business, cleaning that data, and connecting like records together (for example, finding the same employee across various systems).
- Both present a unified view of records for analysis or lookup.
- Both require manual work to cleanse data and match records (though the extent might differ).
How they differ
With the below, it’s important to keep in mind that every MDM tool will have different functionality. Not all of the below will be true for every tool but will give you an idea of the spectrum of tools.
- Data warehouses are generally designed for large scale analytics over many records (for example summing a column) compared to MDMs which expect you to generally be operating on a single record at a time to find all the information for that specific record (some are also designed for large scale joining/matching of records).
- MDM tools generally support editing individual records directly, compared to data warehouses which are predominantly a result of the source data combined with SQL modelling logic, not handmade edits.
- Some MDM tools are designed for bi-directional syncing, meaning a manual edit can propagate to the source system it came from, ensuring your source systems are kept up to date.
- Some MDM tools support automated matching/joining of records across systems through the use of entity resolution. This might mean that it can use an email address to join three records together and through that process discover an employee ID, which it can then use to match an additional five records. This would have to be built manually in a data warehouse environment.
How they complement each other
A good MDM tool isn’t just designed for humans, it’s designed to be able to be used by other systems as well. If you find you’re spending a lot of time manually fixing up records in your data warehouse, an MDM with powerful entity resolution capabilities might be able to replace a large part of your data integration process. In this world, you’ll be using the data warehouse for analysis of records that the MDM has automatically cleaned and matched.
Why you might not need an MDM
As the data maturity of organisations improves, we might find that we’re relying less on scattered data and more on well structured decentralised systems. This is fundamentally a problem of ownership, which is why we see it generally being solved in organisations with sticky teams and solid boundaries (such as microservice-centric organisations). That’s not to say that organisations won’t end up with data scattered across different systems, but the solution might no longer be a centralised MDM, and we might be able to rely more on decentralised integrations between teams and services.
For most of us, however, we will find that the first priority in our organisation will be reporting and analytics, not cleaning data and matching records (and even then, that can generally be done in the bounds of a data warehouse project). As such, most organisations will be able to use the data warehouse in lieu of an MDM, feeding any data issues back into organisational change or source system data improvements, as opposed to into a centralised tool.
Get in touch with Ruben Slabbert
Ruben is a solution architect with data engineering, infrastructure modernisation, and application development experience across Microsoft Azure, Google Cloud Platform, and Amazon Web Services.Get in touch