Udacity - Data Engineering - Cloud Data Warehousing
So I've roughly navigated this course and it was quite a challenge going through the theory because once the architectures were discussed it was just a whole bunch of clicking through in AWS and that can get boring. Architecture : For the architecture itself, the idea is a back office and a front office. The back office is all the sources and individual processes that bring in data. The Data warehouse simplifies these schemas into (possibly) a star model and then makes it easier and faster to use for the analytics / BI division (front office). There are a few variants, where BI can directly access the main source, where DW can be department specific, or unique for each department still maintaining integrity among common columns. Cloud / AWS : Doing this in the cloud provides quicker start time, elasticity, scalability. The general idea is to read from sources and move to a staging S3 bucket and then push to a DW. For smaller tables it might be possible to directly use an EC2 inst