Good afternoon,
I am sure there are some good resources about this here, but I am hoping that someone can help narrow down the starting point. We have utility bill / energy consumption data coming in from multiple sources in varying formats (from printed bills to access databases to CSV files). What I am thinking I would like to try is to consolidate all of these different sources into a single set of tables. Would the Data Lake be a good place to store all this information? Could I build a BOD to handle a standardized CSV? Obviously, to get the data from the printed statements we will still need to do some manual entry unless we could use some sort of advanced OCR.
That's probably clear as mud, but any input is appreciated!
Thanks,
Nick