Is there a way to clean up variations in the datalake at least for specific tables?
On initial load of a table to my local DB using ION ETL I'm seeing several hundred variations of the same table causing 10's of millions of rows to be updated. On successive updates this is greatly reduced. I am using he QueryAll function to return the data.
A couple of examples
- BenefitGroupMember has a total of 7,453 records but initially processed 14,310,251. On the next load only 48,128 records are processed.
- Successive load with no replication, just testing, 48,128 records were processed, so greatly reduced
- PfiWorkunitVariable - 3,960,411 records but processes 86 million records. Still going after 20 hours.
- PfiWorkuit - 562,701 records but processes 74 million records
Granted I'm done with the initial load of these tables however if a user requests another field for that table, a full load will have to be performed again.
The local copy of the data is only used for reporting, we currently replicate and refresh 4 times per day. We only need the latest record, if we need data about updates, we use the audit records from that table. We only have a handful of tables we bring down the audit records for as most users only want the current iteration of the record.
Thank you for your time.