I'm just starting with the ION ETL tool, so far I have all the GHR tables being transferred to my local SQL database (TEST), I'll work on FSM next. All went well with the initial download, except for running the jvm out of heap space. Hopefully I've fixed that.
I missed implementing logging when I set it up so now I'm going back and making these changes and running the transformation to validate it. I'm noticing, even though no data has changed on my Test DataLake, I'm getting every single record originally downloaded. For instance EmployeeFieldHistory is still running, after 2.5 hours, and has processed over 7 million records.
I checked the sys.dm_db_index_usage_stats and sure enough is says for "user_updates" over 7 million.
SELECT OBJECT_NAME(OBJECT_ID) AS TableName,
last_user_update,*
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'INFORDWHR_SRC')
AND OBJECT_ID=OBJECT_ID('EmployeeFieldHistory ')
Is this just a one off or will this full scan occur every time I run it? If it happens every time, then it is of no use to me.
Why don't we use the RepSet_Variation_ID instead of the new field DLDocDate? I would think the initial query would use the max RepSet_Variation_ID download locally and get any greater from the DataLake.
What am I missing here that is seems so simple to me, yet is not the standard in your template?
I will admit I have my own set of tables in the DataLake I use, the tables are not updated by anything else but replication, all fields are downloaded to my SQL database with no modifications.
Thank you