Designing Data Intensive Applications - Ch 1 - Reliability, Scalability, Maintainability
Reliability
Scalability
Maintainability
Data Engines as developers see it; database, ccaches, search index, stream processing, batch processing.
How the data is distributed in disks
encoding data
Reliability - anticipate faults and can tolerate them. Faults - component failure. failure - system failure. hardware faults. software faults. human errors.
telemetry
using interfaces, decoupling, testing
Scalability -
Load parameters
- request to web server
- read or write to database
- cache hit rate
- active users
could be average case or small number of extreme cases
Twitter example :
4.6K to max 12k writes per user, but 300K reads per user. So work is done pushing writes to individual users caches at write time so read time can be faster. write times become a challenge when they involve so much leg work. still done within 5 seconds.
Now twitter does a hybrid model where most tweets follow above approacch, but celebrity tweets are sent at read time.
Performance :
throughput
response time
latency - the time that a request is latent OR waiting to be handled
percentiles - median / 99th percentile better than mean
head of line blocking
tail latency amplification
details of response times ; naive methods like t-digest, forward decay. hdrHistogram
how to deal with increase in loads
shared-nothing or scaling out
scaling up - replacing new systems\
manual scaling or elastic
usually stateful services are kept on single node
stateless services are distributed
reads, writes, data, data complexity, response time, access patterns = architecture is tailored to this
10000 req per sec * 1KB is not = to 3 req per minute = 2GB in size even though same throughput.
Maintainability
Operability + Simplicity + Evolvability
Comments
Post a Comment