How to use /dataobjects/splitquery?

Hi,

I have object in datalake with more than 10K records. I am trying to use splitquery Data Fabric APi but without results. I would appreciate for hint how to extract 10K results using Data Fabric API.

Best,

Thomas

Find more posts tagged with

Infor Data Fabric

Accepted answers

Kevin Heiman

Here is an example I just tested that returned 4 queries.

/DATAFABRIC/datalake/v2/dataobjects/splitquery?filter=dl_document_name%20eq%20'LN_tcmcs098'&records=200

There are some considerations to the ratio of objects in the filter.
How many Files for the object you are requesting in the data lake?
And how are you filtering, and how many are you requesting in each filter.
What response are you getting when you state "without results"?
Does the filter work in the GET /dlobjects request?

Result

[ { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range {null,"1-52c03021-952a-3b27-837b-c65bccaf99b4"})", "sortFields": [ "dl_id:asc" ] }, { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range ["1-52c03021-952a-3b27-837b-c65bccaf99b4","1-a3626013-0b52-3b4a-abcb-876964a34fd9"})", "sortFields": [ "dl_id:asc" ] }, { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range ["1-a3626013-0b52-3b4a-abcb-876964a34fd9","1-f2356bac-2155-39e6-b42c-763806df0b0c"})", "sortFields": [ "dl_id:asc" ] }, { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range ["1-f2356bac-2155-39e6-b42c-763806df0b0c",null})", "sortFields": [ "dl_id:asc" ] } ]

All comments

Kevin Heiman

Here is an example I just tested that returned 4 queries.

/DATAFABRIC/datalake/v2/dataobjects/splitquery?filter=dl_document_name%20eq%20'LN_tcmcs098'&records=200

[ { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range {null,"1-52c03021-952a-3b27-837b-c65bccaf99b4"})", "sortFields": [ "dl_id:asc" ] }, { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range ["1-52c03021-952a-3b27-837b-c65bccaf99b4","1-a3626013-0b52-3b4a-abcb-876964a34fd9"})", "sortFields": [ "dl_id:asc" ] }, { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range ["1-a3626013-0b52-3b4a-abcb-876964a34fd9","1-f2356bac-2155-39e6-b42c-763806df0b0c"})", "sortFields": [ "dl_id:asc" ] }, { "queryFilter": "(document_name eq 'LN_tcmcs098' AND dl_id range ["1-f2356bac-2155-39e6-b42c-763806df0b0c",null})", "sortFields": [ "dl_id:asc" ] } ]

Legacy Contributor

Hi,

Thank you!

I have approximately 11K files of that object in datalake (expect significantly more in the future). And exactly I have problem with setting number of records for the filter. When I've gone below 2000 I have received "Internal Server Error". It is not clear for me how to calculate safe "records" number so it is not too big but be on the safe side and not receive the error.

Best,

Thomas

Kevin Heiman

Ok so the interface is working, just has issues with using the right calculation against the data. Not that on the Filter you can further limit the data , commonly using a day range so that you have less in scope.

We would expect that you are only pulling the data once and not pulling all the data every day. Inital loading/ resets would be the exception to needing a larger quantity.

Consider using a time increment then setting a starting value, pull for next increment, and keep pulling until you are to current then pull every increment forward.

Using a Day as the time increment and wanting last 60 days (or any starting point)
establish a state with the Date to start
using the date state, get query filter for the day, pull the data ,
then add one day to the date state and get the query filter, pull the data....
when day is today-1, do not run the filter until the day becomes today -2, then build a query filter and run...

You could also consider using the 'extract' from data lake in a flow to send you the files as they are available on a time interval. This somewhat does all of this for you, just as individual files.

Legacy Contributor

Hi Kevin,

Thank you for very valuable answer. Really apreciate it.

My understanding is that datalake api returns the most actual variation of object (?) - this is not clearly described in the documentation but at least based on first test I assume this is the case. If so adaption of incremental approach would mean that I have to find and update the record on the other side. Am I correct?

Could please advise also why I got "Internal Server Error" when set records number below 2000? Is there any predictable estimation I can make or this is matter of internal timeout? I believie that calculating chunks (filters) on huge dataset could be time/resource consuming.

Best,

Thomas

Copyright © 2025 Infor. All rights reserved.