[How To] Load data from a file server to the Data Lake

Hey all,

Back again and, this time, with a double header. In the first How To, we connected to a database and used AnySQL to both register a metadata model (that we would reference in subsequent queries) as well as scheduled a data ingestion activity to move some data into the Data Lake. This time, we'll use a file server to move a prepared data set that a fictional 3rd party application uploaded for us.

All throughout, I'll upload the data sets used as well as metadata files so you can follow along in your own Infor OS tenant for good ol' practice.

Getting our metadata in order

Just as when we connected to a database in our previous How To, we're going to use ION's File Connector to connect to a personal sFTP server, define our document/file retrieval, set up a schedule, and move some data into the Data Lake.

One of the key differences between this and our previous How To was that AnySQL helped generate our metadata for us. Since we're dealing with a file server, we need to define our metadata up-front before we start polling the server. There are a couple of ways you can do this.

Data Catalog's Metadata Registration Wizard (easily the easiest approach)
Writing your own metadata package (far less easy & fun)

We'll assume you took the easy route, used the metadata wizard, and generated your metadata file. If you'd like, now's your opportunity to download the Products.zip file and import that into the Data Catalog.

[View:/cfs-file/__key/communityserver-discussions-components-files/96/Products.zip:320:240]

Once you've got Products.zip downloaded, let's crack open Infor OS and navigate to ION Desk, expand the Data Catalog, and drill down to the Object Schemas page - this is where we store all of our metadata across Infor OS. In the toolbar, select the import icon and browse to your Products.zip file.

Once uploaded, you'll have your very own copy of the metadata. There are some interesting characteristics of this Products object such as the indicators associated with the ProductID, variation, and deleted columns. We'll explore these deeper in our next How To.

Let's start connecting some points

We've got the hard part out of the way quite easily. Now, we can focus on the file server.

Let's hit the Connect menu and drill down to Connect Points. If you already have one, great. I'll be using my sFTP server entitled Community. However, you may need to create your own file connection point. Simply click on the + Add tile. Selecting the File option means you're using ION's Enterprise Connector to connect to your on-prem network. If you have a cloud-based FTP server, however, you can simply select the File (sFTP from Cloud) option.

You'll need to provide the following details:

Host Name (aka your file server's URL)
Port (usually 21 or 22)
Username
Password

If you're using sFTP, FTPS, or other secured means of connecting to and accessing the file server, you can read up on how to configure your connection point to utilize those recommended security benefits under the Infor ION Desk User Guide. Tsk tsk, I've gone the easy route.

It's always good hygiene to check that your connection point tests out.

Document Retrieval Configuration

We're nearly to the point of moving some data in. Nearly. First, though, we must help our sFTP server understand what it's looking for. To do that, let's browse to our new connection point's Documents tab. Select the + icon to add in a new document. In this scenario, we'll select the Document radio button and search for our new Products JSON object in the Data Catalog.

Once we've done that, we just need to provide some friendly instructions on what files to look for and where. Under the Read File Settings section, tick the Create non-existing folder(s) checkbox.

Read Location: What directory on your file server should ION check?
Error Location: Where should ION stuff any errors if it hits a problem?
File Name Pattern: Enter in a naming convention that ION will use to process this file as our Products metadata. In this case, let's enter products*.json
After Read: Select your preferred option on what ION should do with the file once it's done processing. I recommend simply keeping it on Move File
Archive Location: Where ION places your processed files after reading it

Easy and straight forward. Now, we've got to get a file uploaded and into the directory where we defined our Read Location.

Moving some data

It's time to get some data into the Data Lake. First thing's first - we'll upload a fictional file fill of products (or items, if you prefer). You can take a quick peek into the file; our 3rd party application generated a list of products it wants to replicate in ndjson format. Our first row is sampled below and describes a Chai product along with some supplementary details about how many units are in stock, quantities per unit, and so on.

{"productid":1,"productname":"Chai","supplierid":1,"categoryid":1,"quantityperunit":"10 boxes x 20 bags","unitprice":18.0,"unitsinstock":39,"unitsonorder":0,"reorderlevel":10,"discontinued":false,"variation":1,"deleted":false}

You can download a copy of this file to use in the exercise. Let's get that uploaded to our file server's read directory and set up our Data Lake Flow.

[View:/cfs-file/__key/communityserver-discussions-components-files/96/sample_json.zip:320:240]

Drag on over to ION Desk and expand the Connect menu option to drill down to Data Flows. Select the + Add tile and select the Data Lake Flow option. Define a name for your flow - in this case, Community - and then begin building your flow by dragging the File activity from the Activity Toolbar into an empty part of your flow modeler.

Once you've dropped that in, select your connection point you defined earlier - again, Community - and select the document icon in the flow model to define your document. We should see our Products option available to us since we previously configured our connection point to support polling for documents and associating it with our Products metadata. Next, drag the Ingest activity and drag that after your file connection point.

Select the Save icon in the toolbar and you're ready to Activate.

Depending on how often you've defined your polling schedule, you can check OneView to monitor when ION successfully pulled that document from the sFTP connection point.

Querying our data

In our last step, we've already confirmed via ION OneView that we've picked up a file called Products and moved it over to the Data Lake. You can likewise navigate to your file server and explore your directories to see how ION processed your file after the read was completed. In my case, it moved our products_1.json file over to our Archive folder.

All that's left to do is navigate to ION Desk, expand the Data Lake navigation menu, and select the Compass query editor. You can either write your own free-form query such as:

SELECT * FROM Products

Alternatively, you can use Compass' built-in SQL generation tool to automate simple queries for you. Simply find or filter for your Products table in the metadata panel. While hovering over your table, select the . . . icon, hover over the SQL family of navigation options and, finally, choose the SELECT option. We'll cover the other options in future How To guides.

Now, hit the Run Query button or use the keyboard shortcut (Windows: CTRL + Enter; Mac:? + Return).

What's next?

Well, that's really up to you. Your CloudSuite applications replicate tons of data today and you're learning how to ingest data from 3rd party & general technology data sources. Using the Compass editor, you can:

Explore & analyze data in Data Lake
Prototype queries for ETL processes
Filter, join, and transform data
Query the Data Lake via Compass APIs with 3rd party tools

In our next How To, we'll continue expanding on the file use case and start exploring what impact metadata has on the types of queries you can run in the Data Lake.

Until next time, hope this has helped introduce some new ideas!

@mike

Find more posts tagged with

Infor Data Fabric

Comments

Legacy Contributor

Hi Mike,

I have used Products.zip file to import json schema to data catalog and used Products_1.json as payload .

both of them are imported to datalake .But i could see invalid json in oneview .Could you please let us know the ndjson format which is supported by oneview.

Thanks,

Anish k.

Kevin Heiman

I would suggest http://jsonlines.org/ http://ndjson.org/ (pasted wrong URL)

Can you use the ION API Payloads API to see the file ingested?

I am assuming you used a File Connector via EC to load the file, can you outline that?

Does the source file match the API Payloads result ?

A two line sample file:

{"item_id":"item1","description":"MyItem1","status":"New","base_uom":"EA"}

{"item_id":"item2","description":"MyItem2","status":"New","base_uom":"EA"}

Legacy Contributor

Hey Anish,

I believe there was a bug that's being fixed in OneView. OneView attempts to validate ndjson using a JSON parser which obviously leads to it reporting it as invalid. However, you should still be able to query the data in Compass.

Can you try to query the file with Compass and see if that returns a result?

Legacy Contributor

HI Mike,

Thanks for the response.

I am able to query the file with compass and it is giving data in compass.

Thanks,

Anish k.

Copyright © 2025 Infor. All rights reserved.