Blog
Search
K
Comment on page
🪅

What are the differences between OLTP and OLAP databases?

Jan 30, 2023
Author: Robin Tang
When differentiating between databases, one key characteristic is whether the database is OLTP or OLAP. We will explore what they are, key differences, and go through some real-world examples in this blog post.

What's OLAP and OLTP?

OLTP

Online transactional processing (OLTP) databases are characterized for having high throughput, low latency transactions that are meant for internal applications.
OLTP databases are known for having strong data integrity, transactions support, and indices to support well-known access patterns.
OLTP databases are used to enable applications to serve requests such as:
  • Can this person log in?
  • Process a credit card payment
  • Connecting a rider with a driver
  • ATMs
  • Hotel check ins
  • Airline ticketing system

OLAP

Online analytical processing (OLAP) databases primarily consist of aggregated data (coming from various data sources) and the dataset is a historical snapshot of the upstream source. OLAP databases are primarily used for data analysis, reports and other types of analytical queries. While performance is still important, it is typically much more forgiving than OLTP databases.
Examples of such OLAP queries may include:
  • How many customers are signing up per month?
  • What is the VIP customer demographics breakdown?
  • What is our paid acquisition channel performance (cost per thousand impressions, cost per click, customer acquisition cost, customer lifetime value)?

Key Differences

Category
OLTP
OLAP
Access patterns
Static and well established. Queries are optimized by creating indices to support their access patterns. If one were to query an unindexed field, it would be extremely slow because it's anti-pattern for the database type and would require a full table scan.
Dynamic. The database allows for dynamic queries across a multitude of datasets, allowing maximum flexibility when performing data analysis.
Type of workload
OLTP workloads would involve reading, updating and deleting data. This is most suitable for applications and processing transactions.
The primary workload here would be queries (reads). This is most suitable for workloads that involve data analysis, machine learning applications using offline datasets for training and other workloads that are read-only.
Relational vs NoSQL
OLTP databases can be either NoSQL or relational.
OLAP databases are strictly relational, as they are optimized for data analysis via SQL.
Main users
Applications. It's advisable for each application to have their own database instance.
  • Data / BI analysts
  • Data scientists
  • Machine learning applications
  • No-code tools
Performance
Operations against an OLTP database should be measured in milliseconds. Performance is hypercritical.
OLAP performance is much more forgiving. As much of the workload is internal facing.
Examples
Relational
  • MySQL
  • PostgreSQL
  • Cockroach DB
NoSQL
  • MongoDB
  • Amazon DynamoDB
  • Google Bigtable
  • Apache HBase
  • Apache Cassandra
  • ScyllaDB
  • Snowflake
  • Google BigQuery
  • Azure Synpase
  • Databricks
  • Amazon Redshift

Putting things together

A typical system architecture diagram that exists at most companies.
In this diagram, the customer would place an order which would result in data being saved in the Order Postgres DB. The order service would also make an API call to the payment service which has its own DB as well. There would be internal processes (typically ETL jobs) that are periodically run to synchronize the data within the service transactional database into the company’s data warehouse.
Separately, there would be processes to synchronize external data from Zendesk, Stripe, Mailchimp. The exact methodology would depend on the vendors’ capabilities. There are vendors that have native integrations with selected data warehouses, others require custom integrations via their API. Companies like Fivetran, Airbyte, Rivery provide connectors that integrate with the tools’ API and load the datasets into your data warehouse.

How does Artie Transfer bridge the gap?

Artie Transfer is able to drastically improve the internal process by streaming and only applying the changes to the downstream data warehouse. By doing so, Transfer is able to provide real-time transactional data within your data warehouse.
In order to do this, Artie Transfer has the following features built in:
  • Automatic retries & idempotency. We take reliability seriously and it's feature 0. Latency reduction is nice, but doesn't matter if the data is wrong. We provide automatic retries and idempotency such that we will always achieve eventual consistency.
  • Automatic table creation. Transfer will create the table in the designated database if the table doesn't exist.
  • Error reporting. Provide your Sentry API key and errors from data processing will appear in your Sentry project.
  • Schema detection. Transfer will automatically detect column changes and apply them to the destination.
  • Scalable architecture. Transfer's architecture stays the same whether we’re dealing with 1GB or 100+ TB of data.
  • Sub-minute latency. Transfer is built with a consumer framework and is constantly streaming messages in the background. Say goodbye to schedulers!
If you are interested in learning more check out Transfer’s Github repo, schedule a demo here or drop us a note at [email protected]!