nexusstc/The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems/3fba5b38464d1a0a1be7429b8139c1b1.pdf
The practice of cloud system administration : designing and operating large distributed systems. Volume 2 🔍
Thomas A. Limoncelli , Strata R. Chalup , Christina J. Hogan
Addison Wesley, The practice of system administration, 2, 2014
英语 [en] · PDF · 5.9MB · 2014 · 📘 非小说类图书 · 🚀/lgli/lgrs/nexusstc/zlib · Save
描述
“There's an incredible amount of depth and thinking in the practices described here, and it's impressive to see it all in one place.” —Win Treese, coauthor of Designing Systems for Internet Commerce The Practice of Cloud System Administration, Volume 2, focuses on “distributed” or “cloud” computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach. Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics: Designing and building modern web and distributed systems Fundamentals of large system design Understand the new software engineering implications of cloud administration Make systems that are resilient to failure and grow and scale dynamically Implement DevOps principles and cultural changes IaaS/PaaS/SaaS and virtual platform selection Operating and running systems using the latest DevOps/SRE strategies Upgrade production systems with zero down-time What and how to automate; how to decide what not to automate On-call best practices that improve uptime Why distributed systems require fundamentally different system administration techniques Identify and resolve resiliency problems before they surprise you Assessing and evaluating your team's operational effectiveness Manage the scientific process of continuous improvement A forty-page, pain-free assessment system you can start using today
备用文件名
lgli/The Practice of Cloud System Administration_ Designing and Operating Large Distributed Systems.pdf
备用文件名
lgrsnf/The Practice of Cloud System Administration_ Designing and Operating Large Distributed Systems.pdf
备用文件名
zlib/Computers/Networking/Thomas A. Limoncelli , Strata R. Chalup , Christina J. Hogan/The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems_3570779.pdf
备选标题
Practice of Cloud System Administration, The: Designing and Operating Large Distributed Systems, Volume 2
备选标题
Practice of Cloud System Administration, The : DevOps and SRE Practices for Web Services, Volume 2
备选作者
Limoncelli, Thomas, Chalup, Strata, Hogan, Christina
备用出版商
Addison-Wesley Longman, Incorporated
备用出版商
Addison-Wesley Professional
备用出版商
Longman Publishing
备用出版商
Adobe Press
备用版本
Pearson Education (US), Upper Saddle River, NJ, 2015
备用版本
United States, United States of America
备用版本
Sep 13, 2014
元数据中的注释
0
元数据中的注释
lg2246664
元数据中的注释
{"isbns":["032194318X","9780321943187"],"last_page":559,"publisher":"Addison Wesley","series":"The practice of system administration","volume":"2"}
元数据中的注释
Source title: Practice of Cloud System Administration, The: Designing and Operating Large Distributed Systems, Volume 2
备用描述
Cover......Page 1
Title Page......Page 4
Copyright Page......Page 5
Contents......Page 8
Preface......Page 24
About the Authors......Page 30
Introduction......Page 32
Part I: Design: Building It......Page 38
1 Designing in a Distributed World......Page 40
1.1 Visibility at Scale......Page 41
1.2 The Importance of Simplicity......Page 42
1.3 Composition......Page 43
1.4 Distributed State......Page 48
1.5 The CAP Principle......Page 52
1.6 Loosely Coupled Systems......Page 55
1.7 Speed......Page 57
1.8 Summary......Page 60
Exercises......Page 61
2.1 Operational Requirements......Page 62
2.2 Implementing Design for Operations......Page 76
2.3 Improving the Model......Page 79
2.4 Summary......Page 80
Exercises......Page 81
3 Selecting a Service Platform......Page 82
3.1 Level of Service Abstraction......Page 83
3.2 Type of Machine......Page 87
3.3 Level of Resource Sharing......Page 93
3.4 Colocation......Page 96
3.5 Selection Strategies......Page 97
Exercises......Page 99
4 Application Architectures......Page 100
4.1 Single-Machine Web Server......Page 101
4.2 Three-Tier Web Service......Page 102
4.3 Four-Tier Web Service......Page 108
4.5 Cloud-Scale Service......Page 111
4.6 Message Bus Architectures......Page 116
4.7 Service-Oriented Architecture......Page 121
4.8 Summary......Page 123
Exercises......Page 124
5 Design Patterns for Scaling......Page 126
5.1 General Strategy......Page 127
5.2 Scaling Up......Page 129
5.3 The AKF Scaling Cube......Page 130
5.4 Caching......Page 135
5.5 Data Sharding......Page 141
5.6 Threading......Page 143
5.7 Queueing......Page 144
5.8 Content Delivery Networks......Page 145
Exercises......Page 147
6 Design Patterns for Resiliency......Page 150
6.1 Software Resiliency Beats Hardware Reliability......Page 151
6.2 Everything Malfunctions Eventually......Page 152
6.3 Resiliency through Spare Capacity......Page 155
6.4 Failure Domains......Page 157
6.5 Software Failures......Page 159
6.6 Physical Failures......Page 162
6.7 Overload Failures......Page 169
6.8 Human Error......Page 172
6.9 Summary......Page 173
Exercises......Page 174
Part II: Operations: Running It......Page 176
7 Operations in a Distributed World......Page 178
7.1 Distributed Systems Operations......Page 179
7.2 Service Life Cycle......Page 186
7.3 Organizing Strategy for Operational Teams......Page 191
7.4 Virtual Office......Page 197
7.5 Summary......Page 198
Exercises......Page 199
8 DevOps Culture......Page 202
8.1 What Is DevOps?......Page 203
8.2 The Three Ways of DevOps......Page 207
8.3 History of DevOps......Page 211
8.4 DevOps Values and Principles......Page 212
8.5 Converting to DevOps......Page 217
8.6 Agile and Continuous Delivery......Page 219
8.7 Summary......Page 223
Exercises......Page 224
9 Service Delivery: The Build Phase......Page 226
9.1 Service Delivery Strategies......Page 228
9.2 The Virtuous Cycle of Quality......Page 231
9.3 Build-Phase Steps......Page 233
9.5 Continuous Integration......Page 236
9.6 Packages as Handoff Interface......Page 238
9.7 Summary......Page 239
Exercises......Page 240
10.1 Deployment-Phase Steps......Page 242
10.2 Testing and Approval......Page 245
10.4 Infrastructure Automation Strategies......Page 248
10.6 Infrastructure as Code......Page 252
10.8 Summary......Page 253
Exercises......Page 254
11.1 Taking the Service Down for Upgrading......Page 256
11.2 Rolling Upgrades......Page 257
11.3 Canary......Page 258
11.4 Phased Roll-outs......Page 260
11.7 Toggling Features......Page 261
11.8 Live Schema Changes......Page 265
11.10 Continuous Deployment......Page 267
11.11 Dealing with Failed Code Pushes......Page 270
11.12 Release Atomicity......Page 271
Exercises......Page 272
12 Automation......Page 274
12.1 Approaches to Automation......Page 275
12.2 Tool Building versus Automation......Page 281
12.3 Goals of Automation......Page 283
12.4 Creating Automation......Page 286
12.6 Language Tools......Page 289
12.7 Software Engineering Tools and Techniques......Page 293
12.8 Multitenant Systems......Page 301
12.9 Summary......Page 302
Exercises......Page 303
13.1 Design Documents Overview......Page 306
13.2 Design Document Anatomy......Page 308
13.4 Document Archive......Page 310
13.5 Review Workflows......Page 311
13.6 Adopting Design Documents......Page 313
13.7 Summary......Page 314
Exercises......Page 315
14.1 Designing Oncall......Page 316
14.2 Being Oncall......Page 325
14.3 Between Oncall Shifts......Page 330
14.4 Periodic Review of Alerts......Page 333
14.5 Being Paged Too Much......Page 335
14.6 Summary......Page 336
Exercises......Page 337
15 Disaster Preparedness......Page 338
15.1 Mindset......Page 339
15.2 Individual Training: Wheel of Misfortune......Page 342
15.3 Team Training: Fire Drills......Page 343
15.4 Training for Organizations: Game Day/DiRT......Page 346
15.5 Incident Command System......Page 354
15.6 Summary......Page 360
Exercises......Page 361
16 Monitoring Fundamentals......Page 362
16.1 Overview......Page 363
16.2 Consumers of Monitoring Information......Page 365
16.3 What to Monitor......Page 367
16.4 Retention......Page 369
16.5 Meta-monitoring......Page 370
16.6 Logs......Page 371
Exercises......Page 373
17 Monitoring Architecture and Practice......Page 376
17.1 Sensing and Measurement......Page 377
17.2 Collection......Page 381
17.3 Analysis and Computation......Page 384
17.4 Alerting and Escalation Manager......Page 385
17.5 Visualization......Page 389
17.7 Configuration......Page 393
17.8 Summary......Page 394
Exercises......Page 395
18 Capacity Planning......Page 396
18.1 Standard Capacity Planning......Page 397
18.2 Advanced Capacity Planning......Page 402
18.3 Resource Regression......Page 412
18.4 Launching New Services......Page 413
18.5 Reduce Provisioning Time......Page 415
18.6 Summary......Page 416
Exercises......Page 417
19 Creating KPIs......Page 418
19.1 What Is a KPI?......Page 419
19.2 Creating KPIs......Page 420
19.3 Example KPI: Machine Allocation......Page 424
19.4 Case Study: Error Budget......Page 427
Exercises......Page 430
20.1 What Does Operational Excellence Look Like?......Page 432
20.2 How to Measure Greatness......Page 433
20.3 Assessment Methodology......Page 434
20.4 Service Assessments......Page 438
20.5 Organizational Assessments......Page 442
20.6 Levels of Improvement......Page 443
20.7 Getting Started......Page 444
20.8 Summary......Page 445
Exercises......Page 446
Epilogue......Page 448
Part III: Appendices......Page 450
A: Assessments......Page 452
A.1 Regular Tasks (RT)......Page 454
A.2 Emergency Response (ER)......Page 457
A.3 Monitoring and Metrics (MM)......Page 459
A.4 Capacity Planning (CP)......Page 462
A.5 Change Management (CM)......Page 464
A.6 New Product Introduction and Removal (NPI/NPR)......Page 466
A.7 Service Deployment and Decommissioning (SDD)......Page 468
A.8 Performance and Efficiency (PE)......Page 470
A.9 Service Delivery: The Build Phase......Page 473
A.10 Service Delivery: The Deployment Phase......Page 475
A.11 Toil Reduction......Page 477
A.12 Disaster Preparedness......Page 479
B: The Origins and Future of Distributed Computing and Clouds......Page 482
Availability Requirements......Page 483
Technology......Page 484
Costs......Page 485
Technology......Page 486
Scaling......Page 487
High Availability......Page 488
B.3 The Dot-Bomb Era (2000–2003)......Page 490
Technology......Page 491
High Availability......Page 492
Scaling......Page 493
Costs......Page 495
Technology......Page 496
High Availability......Page 497
Scaling......Page 498
Costs......Page 499
Costs......Page 500
Scaling and High Availability......Page 502
B.6 Conclusion......Page 503
Exercises......Page 504
C.1 Constant, Linear, and Exponential Scaling......Page 506
C.2 Big O Notation......Page 507
C.3 Limitations of Big O Notation......Page 509
D.1 Design Document Template......Page 512
D.2 Design Document Example......Page 513
D.3 Sample Postmortem Template......Page 515
E: Recommended Reading......Page 518
Bibliography......Page 522
A......Page 530
B......Page 532
C......Page 533
D......Page 535
E......Page 538
G......Page 539
H......Page 540
J......Page 541
L......Page 542
M......Page 543
N......Page 544
O......Page 545
P......Page 546
R......Page 548
S......Page 549
T......Page 552
W......Page 554
Z......Page 555
Title Page......Page 4
Copyright Page......Page 5
Contents......Page 8
Preface......Page 24
About the Authors......Page 30
Introduction......Page 32
Part I: Design: Building It......Page 38
1 Designing in a Distributed World......Page 40
1.1 Visibility at Scale......Page 41
1.2 The Importance of Simplicity......Page 42
1.3 Composition......Page 43
1.4 Distributed State......Page 48
1.5 The CAP Principle......Page 52
1.6 Loosely Coupled Systems......Page 55
1.7 Speed......Page 57
1.8 Summary......Page 60
Exercises......Page 61
2.1 Operational Requirements......Page 62
2.2 Implementing Design for Operations......Page 76
2.3 Improving the Model......Page 79
2.4 Summary......Page 80
Exercises......Page 81
3 Selecting a Service Platform......Page 82
3.1 Level of Service Abstraction......Page 83
3.2 Type of Machine......Page 87
3.3 Level of Resource Sharing......Page 93
3.4 Colocation......Page 96
3.5 Selection Strategies......Page 97
Exercises......Page 99
4 Application Architectures......Page 100
4.1 Single-Machine Web Server......Page 101
4.2 Three-Tier Web Service......Page 102
4.3 Four-Tier Web Service......Page 108
4.5 Cloud-Scale Service......Page 111
4.6 Message Bus Architectures......Page 116
4.7 Service-Oriented Architecture......Page 121
4.8 Summary......Page 123
Exercises......Page 124
5 Design Patterns for Scaling......Page 126
5.1 General Strategy......Page 127
5.2 Scaling Up......Page 129
5.3 The AKF Scaling Cube......Page 130
5.4 Caching......Page 135
5.5 Data Sharding......Page 141
5.6 Threading......Page 143
5.7 Queueing......Page 144
5.8 Content Delivery Networks......Page 145
Exercises......Page 147
6 Design Patterns for Resiliency......Page 150
6.1 Software Resiliency Beats Hardware Reliability......Page 151
6.2 Everything Malfunctions Eventually......Page 152
6.3 Resiliency through Spare Capacity......Page 155
6.4 Failure Domains......Page 157
6.5 Software Failures......Page 159
6.6 Physical Failures......Page 162
6.7 Overload Failures......Page 169
6.8 Human Error......Page 172
6.9 Summary......Page 173
Exercises......Page 174
Part II: Operations: Running It......Page 176
7 Operations in a Distributed World......Page 178
7.1 Distributed Systems Operations......Page 179
7.2 Service Life Cycle......Page 186
7.3 Organizing Strategy for Operational Teams......Page 191
7.4 Virtual Office......Page 197
7.5 Summary......Page 198
Exercises......Page 199
8 DevOps Culture......Page 202
8.1 What Is DevOps?......Page 203
8.2 The Three Ways of DevOps......Page 207
8.3 History of DevOps......Page 211
8.4 DevOps Values and Principles......Page 212
8.5 Converting to DevOps......Page 217
8.6 Agile and Continuous Delivery......Page 219
8.7 Summary......Page 223
Exercises......Page 224
9 Service Delivery: The Build Phase......Page 226
9.1 Service Delivery Strategies......Page 228
9.2 The Virtuous Cycle of Quality......Page 231
9.3 Build-Phase Steps......Page 233
9.5 Continuous Integration......Page 236
9.6 Packages as Handoff Interface......Page 238
9.7 Summary......Page 239
Exercises......Page 240
10.1 Deployment-Phase Steps......Page 242
10.2 Testing and Approval......Page 245
10.4 Infrastructure Automation Strategies......Page 248
10.6 Infrastructure as Code......Page 252
10.8 Summary......Page 253
Exercises......Page 254
11.1 Taking the Service Down for Upgrading......Page 256
11.2 Rolling Upgrades......Page 257
11.3 Canary......Page 258
11.4 Phased Roll-outs......Page 260
11.7 Toggling Features......Page 261
11.8 Live Schema Changes......Page 265
11.10 Continuous Deployment......Page 267
11.11 Dealing with Failed Code Pushes......Page 270
11.12 Release Atomicity......Page 271
Exercises......Page 272
12 Automation......Page 274
12.1 Approaches to Automation......Page 275
12.2 Tool Building versus Automation......Page 281
12.3 Goals of Automation......Page 283
12.4 Creating Automation......Page 286
12.6 Language Tools......Page 289
12.7 Software Engineering Tools and Techniques......Page 293
12.8 Multitenant Systems......Page 301
12.9 Summary......Page 302
Exercises......Page 303
13.1 Design Documents Overview......Page 306
13.2 Design Document Anatomy......Page 308
13.4 Document Archive......Page 310
13.5 Review Workflows......Page 311
13.6 Adopting Design Documents......Page 313
13.7 Summary......Page 314
Exercises......Page 315
14.1 Designing Oncall......Page 316
14.2 Being Oncall......Page 325
14.3 Between Oncall Shifts......Page 330
14.4 Periodic Review of Alerts......Page 333
14.5 Being Paged Too Much......Page 335
14.6 Summary......Page 336
Exercises......Page 337
15 Disaster Preparedness......Page 338
15.1 Mindset......Page 339
15.2 Individual Training: Wheel of Misfortune......Page 342
15.3 Team Training: Fire Drills......Page 343
15.4 Training for Organizations: Game Day/DiRT......Page 346
15.5 Incident Command System......Page 354
15.6 Summary......Page 360
Exercises......Page 361
16 Monitoring Fundamentals......Page 362
16.1 Overview......Page 363
16.2 Consumers of Monitoring Information......Page 365
16.3 What to Monitor......Page 367
16.4 Retention......Page 369
16.5 Meta-monitoring......Page 370
16.6 Logs......Page 371
Exercises......Page 373
17 Monitoring Architecture and Practice......Page 376
17.1 Sensing and Measurement......Page 377
17.2 Collection......Page 381
17.3 Analysis and Computation......Page 384
17.4 Alerting and Escalation Manager......Page 385
17.5 Visualization......Page 389
17.7 Configuration......Page 393
17.8 Summary......Page 394
Exercises......Page 395
18 Capacity Planning......Page 396
18.1 Standard Capacity Planning......Page 397
18.2 Advanced Capacity Planning......Page 402
18.3 Resource Regression......Page 412
18.4 Launching New Services......Page 413
18.5 Reduce Provisioning Time......Page 415
18.6 Summary......Page 416
Exercises......Page 417
19 Creating KPIs......Page 418
19.1 What Is a KPI?......Page 419
19.2 Creating KPIs......Page 420
19.3 Example KPI: Machine Allocation......Page 424
19.4 Case Study: Error Budget......Page 427
Exercises......Page 430
20.1 What Does Operational Excellence Look Like?......Page 432
20.2 How to Measure Greatness......Page 433
20.3 Assessment Methodology......Page 434
20.4 Service Assessments......Page 438
20.5 Organizational Assessments......Page 442
20.6 Levels of Improvement......Page 443
20.7 Getting Started......Page 444
20.8 Summary......Page 445
Exercises......Page 446
Epilogue......Page 448
Part III: Appendices......Page 450
A: Assessments......Page 452
A.1 Regular Tasks (RT)......Page 454
A.2 Emergency Response (ER)......Page 457
A.3 Monitoring and Metrics (MM)......Page 459
A.4 Capacity Planning (CP)......Page 462
A.5 Change Management (CM)......Page 464
A.6 New Product Introduction and Removal (NPI/NPR)......Page 466
A.7 Service Deployment and Decommissioning (SDD)......Page 468
A.8 Performance and Efficiency (PE)......Page 470
A.9 Service Delivery: The Build Phase......Page 473
A.10 Service Delivery: The Deployment Phase......Page 475
A.11 Toil Reduction......Page 477
A.12 Disaster Preparedness......Page 479
B: The Origins and Future of Distributed Computing and Clouds......Page 482
Availability Requirements......Page 483
Technology......Page 484
Costs......Page 485
Technology......Page 486
Scaling......Page 487
High Availability......Page 488
B.3 The Dot-Bomb Era (2000–2003)......Page 490
Technology......Page 491
High Availability......Page 492
Scaling......Page 493
Costs......Page 495
Technology......Page 496
High Availability......Page 497
Scaling......Page 498
Costs......Page 499
Costs......Page 500
Scaling and High Availability......Page 502
B.6 Conclusion......Page 503
Exercises......Page 504
C.1 Constant, Linear, and Exponential Scaling......Page 506
C.2 Big O Notation......Page 507
C.3 Limitations of Big O Notation......Page 509
D.1 Design Document Template......Page 512
D.2 Design Document Example......Page 513
D.3 Sample Postmortem Template......Page 515
E: Recommended Reading......Page 518
Bibliography......Page 522
A......Page 530
B......Page 532
C......Page 533
D......Page 535
E......Page 538
G......Page 539
H......Page 540
J......Page 541
L......Page 542
M......Page 543
N......Page 544
O......Page 545
P......Page 546
R......Page 548
S......Page 549
T......Page 552
W......Page 554
Z......Page 555
备用描述
"The Practice of Cloud System Administration, Volume 2, focuses on 'distributed' or 'cloud' computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach. Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics: Designing and building modern web and distributed systems: Fundamentals of large system design; Understand the new software engineering implications of cloud administration; Make systems that are resilient to failure and grow and scale dynamically; Implement DevOps principles and cultural changes; IaaS/PaaS/SaaS and virtual platform selection; Operating and running systems using the latest DevOps/SRE strategies: Upgrade production systems with zero down-time; What and how to automate, how to decide what not to automate; On-call best practices that improve uptime; Why distributed systems require fundamentally different system administration techniques; Identify and resolve resiliency problems before they surprise you; Assessing and evaluating your team's operational effectiveness; Manage the scientific process of continuous improvement; A forty-page, pain-free assessment system you can start using today"--Publisher's description
备用描述
Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan. Includes bibliographical references and index.
开源日期
2018-08-07
🚀 快速下载
成为会员以支持书籍、论文等的长期保存。为了感谢您对我们的支持,您将获得高速下载权益。❤️
如果您在本月捐款,您将获得双倍的快速下载次数。
🐢 低速下载
由可信的合作方提供。 更多信息请参见常见问题解答。 (可能需要验证浏览器——无限次下载!)
- 低速服务器(合作方提供) #1 (稍快但需要排队)
- 低速服务器(合作方提供) #2 (稍快但需要排队)
- 低速服务器(合作方提供) #3 (稍快但需要排队)
- 低速服务器(合作方提供) #4 (稍快但需要排队)
- 低速服务器(合作方提供) #5 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #6 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #7 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #8 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #9 (无需排队,但可能非常慢)
- 下载后: 在我们的查看器中打开
所有选项下载的文件都相同,应该可以安全使用。即使这样,从互联网下载文件时始终要小心。例如,确保您的设备更新及时。
外部下载
-
对于大文件,我们建议使用下载管理器以防止中断。
推荐的下载管理器:JDownloader -
您将需要一个电子书或 PDF 阅读器来打开文件,具体取决于文件格式。
推荐的电子书阅读器:Anna的档案在线查看器、ReadEra和Calibre -
使用在线工具进行格式转换。
推荐的转换工具:CloudConvert和PrintFriendly -
您可以将 PDF 和 EPUB 文件发送到您的 Kindle 或 Kobo 电子阅读器。
推荐的工具:亚马逊的“发送到 Kindle”和djazz 的“发送到 Kobo/Kindle” -
支持作者和图书馆
✍️ 如果您喜欢这个并且能够负担得起,请考虑购买原版,或直接支持作者。
📚 如果您当地的图书馆有这本书,请考虑在那里免费借阅。
下面的文字仅以英文继续。
总下载量:
“文件的MD5”是根据文件内容计算出的哈希值,并且基于该内容具有相当的唯一性。我们这里索引的所有影子图书馆都主要使用MD5来标识文件。
一个文件可能会出现在多个影子图书馆中。有关我们编译的各种数据集的信息,请参见数据集页面。
有关此文件的详细信息,请查看其JSON 文件。 Live/debug JSON version. Live/debug page.