Good Day! Inevitably this is a Compass / driver question, but framing of problem provided: In one of the latest releases for Birst, a new Connect option was enabled that supposedly uses the Compass JDBC driver connection to connect to the Infor Data Lake, for Multi-tenant (non-Govcloud) customers. * In theory, this is wonderful to limit the initial extract sizes by allowing filtering beyond the partition/high-level indexing that general sees us using the lastmodified datestamp and filtering through subsequent scripting. * In practice, we are hitting the cached / optimized layer provided on the data lake, that is not necessarily up-to-date.* I understand the data lake is a semi-relational/non-relational data storage, and the querying is more SQL-like. A few questions related: * What is the frequency of the rebuilds of this optimized layer for Compass? (e.g. nightly, weekly, every 5th run) * Is there query syntax available to leverage through Compass, to avoid hitting the cache layer?* Even in a less-than-optimal pull of the data, it should be faster than writing all the records for a given time frame, especially for smaller (active records) sources that are full-refresh modeling * Assuming we are using a read-only connection, so running EXEC commands won't work through Birst * Is there a setting in ION Desk / Data Fabric, to affect when these intermediate layers are rebuilt? If we don't know when the data is refreshed, we cannot communicate correct framing to users. Any clarification related to either my understanding of functionality, the use-case, unsolicited but relevant information, or answers to the related questions, are greatly appreciated! Ben

nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Compass Driver Connection in Birst Connect - Is there syntax to avoid cached layer?

Legacy Contributor

Good Day!

Inevitably this is a Compass / driver question, but framing of problem provided:

In one of the latest releases for Birst, a new Connect option was enabled that supposedly uses the Compass JDBC driver connection to connect to the Infor Data Lake, for Multi-tenant (non-Govcloud) customers.

In theory, this is wonderful to limit the initial extract sizes by allowing filtering beyond the partition/high-level indexing that general sees us using the lastmodified datestamp and filtering through subsequent scripting.
In practice, we are hitting the cached / optimized layer provided on the data lake, that is not necessarily up-to-date.
- I understand the data lake is a semi-relational/non-relational data storage, and the querying is more SQL-like.

A few questions related:

What is the frequency of the rebuilds of this optimized layer for Compass? (e.g. nightly, weekly, every 5th run)
Is there query syntax available to leverage through Compass, to avoid hitting the cache layer?
- Even in a less-than-optimal pull of the data, it should be faster than writing all the records for a given time frame, especially for smaller (active records) sources that are full-refresh modeling
- Assuming we are using a read-only connection, so running EXEC commands won't work through Birst
Is there a setting in ION Desk / Data Fabric, to affect when these intermediate layers are rebuilt?

If we don't know when the data is refreshed, we cannot communicate correct framing to users.

Any clarification related to either my understanding of functionality, the use-case, unsolicited but relevant information, or answers to the related questions, are greatly appreciated!

Ben

Find more posts tagged with

Accepted answers

Legacy Contributor

Hey @blew ,

Can you share some more observations on what you've seen or experienced that might be impacting your workflows or designs? Or are you anticipating that these concepts could, in-turn, present challenges for future projects? In short, Compass SQL returns results based on what's in storage at the time of the query.

Compass SQL utilizes columnar technologies behind the scenes so one way in which you can optimize query run-time is by only selecting the columns needed for the use case, as opposed to a SELECT *. As well, performing aggregations and other columnar options at extraction tends to optimize the overall end-to-end workflow.

While there is a 'caching*' layer that Compass SQL directly interfaces with, that caching is event-driven based on a client querying a given table. As a result, some quantity of time in your query is spent in identifying any new data objects that've landed in storage and caching them before the query truly begins. At present, this lead-time in caching can be more directly felt when a large volume of data has entered the Data Lake or due to infrequent querying resulting in a large build-up of data objects that are first processed for caching for the query.**

As a result, some clients will schedule warm-up queries on some frequency (e.g. daily) to keep the cache warmer and reduce the backlog of data objects that'll need to be processed before the query really starts crunching data***. We tend to see this impact a small percentage of clients because queries are often extremely frequent by nature of solution requirements and implementations.

* While you'll often hear this process referred to as "caching," it's not a cache in the conventional sense. When new data enters Data Lake storage, it's eligible to be picked up and processed by a query. A more technically accurate but esoteric concept is to describe what Compass is doing as "reformatting" or creating a virtual copy of your data objects in an optimal data format that provides for faster query times and processing speeds. This is why we do have a hint called --*skipReformatting that allows clients to query what's been reformatted. This can be useful to bypass the caching mechanism.

** Our product team is working on a substantial update for Compass later this year that will all but eliminate the 'caching' penalty felt by clients, especially in circumstances where querying a given table if infrequent or an administrator runs a Compass stored procedure that drops the cache.

*** Administrators that run "warm-up" queries just to keep the caching activities relatively small. While this can certainly be helpful short-term, our platform updates anticipated for October should reduce the need to rely on these administrative tricks in absence of frequent interactions with a data object.

All comments

Legacy Contributor

Hey @blew ,

Legacy Contributor

Thanks, Mike! Very thoughtful response, and the columnar indexing helps - I've worked with Trevor Patton on the Infor side with Columnar on the CSF/HRT side. Unfortunately for me, we are using Infor Line Baan (LN) which shortens my leash a bit.

For clarification, the caching is actually the queries being passed through Athena, correct?

To your questions:

We were attempting to load the Item Inventory by Warehouse (WHWMD215) for LN through Birst using the Compass connector but were seeing deviations on subsequent runs for totals on columns compared against what was coming out of the initial table loads from the source system. On deep-diving the lines, we determined compass was not querying the most recent changes brought forward, as we were attempting to run this workflow in the middle of the day (while many transactions are passing through). When we cleared the tables and the "cache", we got more correct counts (still tough to perfectly align with ever-changing lines midday).

Currently, this is presenting challenges in communicating refreshes to users/stakeholders. I'm not sure about the warm-up queries, though it's an interesting thought... May play around with that if time permits.

The update for October we will keep an eye on, and I'll see whether skipReformatting could help in any way.

Thanks Again for the Helpful Response!
Ben

Legacy Contributor

Hey Ben,

So, it doesn't sound like this is a cache of large backloads of data "to be cached" but rather the results you're seeing via Compass are behind what you see in LN.

High volume transaction tables might be difficult to pin down and determine if any two systems are in-sync. The solution architecture, such as it exists today, is oriented towards batch, big data use cases. It's otherwise considered "eventually consistent."

That's attributable to the fact that applications often accumulate record updates for some period of time before sending them to Data Lake. Often and most typically today, ION is used to ferry the data that applications send over to Data Lake. ION guarantees "at least once" delivery but delivery times can be variable due to a number of factors. As such, trying to pin down record counts between Compass (Data Lake) and the system of record is impractical.

Whether you're using the original retrieval interface from Data Lake (the OG released in 2017, I believe) or Compass, I would bet that you're observing the same behaviors of not seeing LN data exactly matching that of the system of record. If that's the case, then it's another observed tick towards the "eventual consistency" architecture description above.

I've seen some customers implement routines to check record counts "as of" a particular time - say, 3 hours ago. The idea is that any data that was sent, delivered, and stored would be relatively stable and mature and you're not contending with data movement challenges.

Ultimately, it comes down to aligning functional user expectations that the data will reflect the business at a certain point but it's not real-time. To that end, though, users do have real-time expectations and so in the future, we have a new feature coming in April 2023 for real-time, streaming pipelines

Copyright © 2025 Infor. All rights reserved.