Search This Blog

FAANGgirl

Learning Journal

Get link
Facebook
X
Pinterest
Email
Other Apps

January 19, 2022

https://docs.google.com/document/d/11DCwE8qZ6wI9fruTsrTjsAxQWBqivJPGwgNvQ1wmjek/edit

Get link
Facebook
X
Pinterest
Email
Other Apps

Comments

Designing Data Intensive Applications - Ch 1 - Reliability, Scalability, Maintainability

January 04, 2023

Reliability Scalability Maintainability Data Engines as developers see it; database, ccaches, search index, stream processing, batch processing. How the data is distributed in disks encoding data Reliability - anticipate faults and can tolerate them. Faults - component failure. failure - system failure. hardware faults. software faults. human errors. telemetry using interfaces, decoupling, testing Scalability - Load parameters - request to web server - read or write to database - cache hit rate - active users could be average case or small number of extreme cases Twitter example : 4.6K to max 12k writes per user, but 300K reads per user. So work is done pushing writes to individual users caches at write time so read time can be faster. write times become a challenge when they involve so much leg work. still done within 5 seconds. Now twitter does a hybrid model where most tweets follow above approacch, but celebrity tweets are sent at read time. Performance : throughpu...

Udacity - Data Engineering - Cloud Data Warehousing

January 06, 2023

So I've roughly navigated this course and it was quite a challenge going through the theory because once the architectures were discussed it was just a whole bunch of clicking through in AWS and that can get boring. Architecture : For the architecture itself, the idea is a back office and a front office. The back office is all the sources and individual processes that bring in data. The Data warehouse simplifies these schemas into (possibly) a star model and then makes it easier and faster to use for the analytics / BI division (front office). There are a few variants, where BI can directly access the main source, where DW can be department specific, or unique for each department still maintaining integrity among common columns. Cloud / AWS : Doing this in the cloud provides quicker start time, elasticity, scalability. The general idea is to read from sources and move to a staging S3 bucket and then push to a DW. For smaller tables it might be possible to directly use an EC2...

About me and Why read this blog?

December 22, 2021

About me : Hello there, I'm Ananya Jayakumar - 33 years old when I start this blog , 5 months into my pregnancy , 10 years experience in tech : Operations / Data analysis / ETL / dash-boarding / managing teams / Solving problems. Pregnancy - Slowing your career I just completed a personally fulfilling year onboarding a team and training them to perform, tackling fraud, building dashboards and creating automated ETL behind the dashboard. But since I'm going on maternity leave, I'm training folks on my job - making it easy to replace myself and jeopardizing a promo that I'd otherwise been well placed for and also having to coming back from maternity leave - and starting from scratch as if all the stuff I did before the leave does not count. It's also a time when my company is getting acquired, the business and products are changing and not the best time to be planning maternity. I feel like I have potential to be in a better place. Pregnancy - the baby's most i...