So I've roughly navigated this course and it was quite a challenge going through the theory because once the architectures were discussed it was just a whole bunch of clicking through in AWS and that can get boring. Architecture : For the architecture itself, the idea is a back office and a front office. The back office is all the sources and individual processes that bring in data. The Data warehouse simplifies these schemas into (possibly) a star model and then makes it easier and faster to use for the analytics / BI division (front office). There are a few variants, where BI can directly access the main source, where DW can be department specific, or unique for each department still maintaining integrity among common columns. Cloud / AWS : Doing this in the cloud provides quicker start time, elasticity, scalability. The general idea is to read from sources and move to a staging S3 bucket and then push to a DW. For smaller tables it might be possible to directly use an EC2 inst
Relational Model Others that competed and did not last: network model hierarchical model XML database Object database NOSQL: specialized query options, expressive data model polyglot persistence impedence layer : the mismatch when moving from OO applications to relational databases Impedence is reduced by a translation layer like JSON JSON (document databases): flexible schema better locality : sub categories in one place instead of complex joins hard for many to many relations (easier in SQL) closer to data structure used by app layer schema on read instead of schema on write network model - tree structure like hierarchical but allowed multiple parents and so many-to-many - pointers and access paths instead of joins in SQL - complicated code even though efficient in small drives SQL - no access paths, just individual tables - access paths on the fly using query optimizer - can change indexes without changing table - conurrent - fault tolerant - shredding Graph Model Query Languages
Comments
Post a Comment