What the heck is Ambar?!

Tom Schutte

Mar 1, 2024

7 Mins Read

There are dozens of different flavors each catering to a specific use case, but these databases are often also specifically designed to handle single events and storing data at rest - not at handling data for real-time systems. Getting data into motion for real-time use is a much more interesting challenge, and has been an ongoing problem since the inception of the internet. To get your data from point A to point B might involve undersea cables, microwaves, lasers or even satellites. Layer on top of this the need to have guarantees about that data-in-motion like availability, durability, low-latency and more. This has lead to incredible data streaming technologies like Apache Kafka, Apache Pulsar, and countless others.

However, these real-time data streaming systems can be complex to manage, having dozens or hundreds of user configuration parameters and requiring deep expertise in distributed systems to be able to operate reliably at scale. There have been some steps to simplify the use of these complex systems with various managed offerings, but these still require a high level of end-user management and system understanding to be able to use, only solving one part of the friction of using realtime-data-streaming systems. Typically the solutions will help get you started with a good-enough initial configuration and will abstract the hardware management away from your teams helping your engineers move faster and reducing the level of expertise required to start using them, but there are still requirements to build ways to connect your existing data stores to these systems and build mechanisms to get your data into motion and perform analysis and filters before you get your final result. Thats all before we begin to talk about ordering and delivery guarantees.

Ambar changes that.

With Ambar, you simply add your existing database systems as a DataSource, describe your desired data filtering, and setup a simple HTTP endpoint to which Ambar will push the resulting ordered event sequence. You keep using your existing data storage systems, and build a simple HTTP interface to receive your new streamed data to enable new features and faster business decisions. Ambar owns managing the pipeline in between allowing the user to not worry about writing custom connectors, managing pipeline infra scaling or configuration. With a few lines of configuration for your DataSource Filter, and DataDestination you can have your data in motion in a matter of minutes with minimal strain on existing systems and infinite data storage allowing for rapid rebuilds and deployment of read models. Lets take a look at an example.

———————————————————————————————————————————

Todo: New example
Lets consider an example where a business has a catalog of products which it sells in Europe, Asia, and North America. They store details about the products into a Postgres database, where the sales of each product with details like its unique product code, time of sale, region, etc are tracked. During a typical day, they might sell a few hundred products and during holiday season this might peek to a few thousands sales per day, resulting in on average 20 transactions-per-second (TPS) on peek days. Given this reasonably low rate of updates to the database they have optimized the system to handle peek events, while supporting other systems without over provisioning hardware. This hypothetical business has been in operation for about a decade and has a few million records. The business is finally ready to expand its product catalog into South American market. In order to decide what products to offer initially, analysts decided to see what products have done sales in the North America market. They want to make sure that they see the sales in order so that we can account for how different seasons might impact product sales in our new region, they can take advantage of Ambar’s ordering guarantee to help us with this.

Let’s build this view using Ambar to see how we can rapidly deploy new read models on historical and real time data, without impacting our existing systems. First we will need to link our Postgres database with our Ambar environment so Ambar can import all of our historical data. This is easy with Terraform, we can simply create an ambar_data_source resource like so:

resource "ambar_data_source" "product_sales_postgres_database" {
  data_source_type = "postgres"
  description = "Reads from the global product sales table"
    partitioning_column = "productId"
      serial_column = "sequenceId"
        username = "ambarUser"
          password = "ambarUserPass"
            data_source_config = {
              "hostname": "product-db.prod.bigbusiness.com",
              "hostPort": "5432",
              "databaseName": "global-product-catalog",
              "tableName": "product-sales",
              "publicationName": "ambar_product_sales_conn",
              "columns": "productId,sequenceId,region,sale_qty,sale_date,..."
            }
}

Lets also setup a Filter on our DataSource so that we are only looking at the North American market sales. This is also easy using Terraform using an ambar_filter resource.

resource "ambar_filter" "north_american_sales_filter" {
  description = "Reads from the global product sales table"
    dataSourceId = ""
      # When passed via Terraform or the API, the contents of the filter are Base64
        # encoded. Here is the filter contents decoded so we can inspect what it is
          # doing.
            # 'substring(lookup("region"), "north-america")'
            filterContents = "c3Vic3RyaW5nKGxvb2t1cCgicmVnaW9uIiksICJub3J0aC1hbWVyaWNhIik="}

Finally, we’ll need a location to send the results. We can setup a simple HTTP endpoint for Ambar to send the result sequence. This is again easy using Terraform to configure an Ambar DataDestination.

resource "ambar_data_destination" "filtered_sales_na" {
  filter_ids = [
    ambar_filter.north_american_sales_filter.resource_id
  ]
    description = "Sends filtered sales sequences for the North American market."
      destination_endpoint = "https://prod.bigbusiness.com/filtered-data"
        destination_name = "na-filtered"
          username = "ambarUserDest"
            password = "ambarUserDestPass"
}

From here, Ambar will ingest the historical data from the Postgres database into the Ambar environment. Then it will extract the data and apply the filter to each product record sequence before delivering the resulting event sequence to the configured HTTP endpoint. New events in the Postgres database are automatically pulled into the Ambar environment, filtered and delivered ensuring the model has the latest view. This decoupling of building the view from the postgres database also means that we could create a new filter and setup a new data destination to create a new view without impacting our Postgres database and in a fraction of the time.

Its really that easy! Want to learn more? Find us at ambar.cloud or contact us at contact@ambar.cloud

Over $3B in Transactions Processed

Discover what Ambar could do for you!

Over $3B in Transactions Processed

Discover what Ambar could do for you!

Over $3B in Transactions Processed

Discover what Ambar could do for you!