Does anyone know if snowflake can connect directly to the Data Lake, or can it only connect to a Birst sources extract/staging table?
No experience with SNOW, but there is a Data Fabric API that allows you to query Data Lake data using Compass queries. Not sure if that would be sufficient for your use case.
I believe you can also set up Data Pipelines to stream data out of the Data Lake in Data Fabric, so you might explore that as well. I think it's an additional license though.docs.infor.com/.../default.html
We are patiently waiting for the steaming to be available via SQL Server or something other than postgres.
Only way we've been able to achieve data lake > snowflake, is via the Infor ETL tool.
It's a bit slow, and because snowflake isn't an RDS the compute usage is high when importing data.
Has someone looked at using a 'Data Flow' using 'Stream' to 'Database'?It would seem this should be an alternative to be able to 'Stream' to different local Data Base types (i.e. SQL Server) but I still have to test this.
you can export data from Datalake via a JSON file and use SFTP as an end-point connector. Works well.
Peter, What tool do you use to do the export?
The Stream Option in Data Flow is for a Kinesis Connection Point. The Pipeline service can go to PostgreSQL AWS or Aurora, with plans to go to MS SQL in the future.
We have a DataPond service that uses the retreive from Data Lake Data Flow function to grab data from DataLake and then ingest them to an MS SQL database. This requires Windows OS as well. Reach out to me if you want to hear more.
I imagine he uses the Retrieve method in a Data Lake Data Flow. I would caution about using sFTP, direct from Cloud, when you have a lot of objects. I have recently found issues with sFTP and a lot of objects being sent. We had to abandon this because it kept getting too far behind and the Pending objects kept going up, instead of down. I have seen zero issues when using any of the file connectors, including sftp, through the Enterprise Connector.
Brandon, thank you. We are trying to find a solution to basically create an ODS that the business can connect to and do deep dive mining of the data to find issues or potential cost saving decisions.
In our environment we are using a custom built Azure function which exposes an endpoint to which ION can send the messages via a data flow.
So, the data flow sends the json files which are produced at the source, directly to the destination (API Endpoint) without the need of Infor Data lake.
The json files are stored in azure blob storage and then consumed into snowflake.
@suneet-sahadevan Coming back to an older thread, but in this model, how/where do you identify the most recent version (or record of truth). We are doing something very similar, but the management and processing times are getting to be problematic in Azure.
We have created a tool that automatically imports the data lake data into an SQL server database. The data is delivered to a location by a Data Lake flow in ION via the enterprise connector, after which the tool imports it. The tables linked in the data lake flow are also automatically created with the correct data types.
As others suggested, there are various ways that data can be extracted from Data Lake to a storage staging location from where it can be loaded into Snowflake. That can be achieved with the Data Fabric APIs, Object API (Data Lake raw object retrieval) or Compass SQL API (query Data Lake), or alternatively to use ION as the push mechanism to one of the ION connection points. When possible, we highly recommend to rely on the Data Fabric services for data integration use cases, instead of ION, due to Data Fabric being designed to handle very large volumes efficiently.
Data Fabric Stream Pipelines is the feature that currently provides real-time data movement when the source is publishing using our streaming technology Streaming Ingestion. Currently M3, LN, Automotive Exchange support publishing data through streaming ingestion, and more CloudSuite ERPs will follow. The other services and data delivery features will always use the data from Data Lake which is not designed to be real-time. Stream Pipelines can also process data that is published in batch, however in that case also Stream Pipelines consume the data from the Data Lake, and there will be the delay from the source batching process.
It is correct that Stream Pipelines currently only supports connectivity to Postgres destinations. However, there is good news for the customers using Snowflake -
We are targeting for the October 2024 update to release for Stream Pipelines a native Snowflake connector, based on the Snowflake Streaming API. Stream Pipelines will be able to directly stream the source published data events to a Snowflake table, eliminating any delays and need to stage files in a storage location to then load with Snowpipes into Snowflake tables. Happy to share more information.
Please note that roadmap is with disclaimer that timelines and features are tentative and can change. Some of you mentioned Microsoft SQL Server, just letting you know we are targeting to provide support for MS SQL Server destinations in 2025.
Please check out the Data Fabric Roadmap deck in Concierge - Important Documents for brief information on our roadmap and the upcoming features and as well feel free to contact your Infor representative to discuss the upcoming features in more detail.
Thank you!
That is awesome. Is there a site that can help us determine the cost to steam these would be using pipeline vs. infor etl, API, etc?