The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2 🔍
Thomas A. Limoncellii; Strata R. Chalup; Christina J. Hogan & Strata R. Chalup & Christina J. Hogan Addison-Wesley Professional, Sep 13, 2014
英语 [en] · PDF · 5.9MB · 2014 · 📘 非小说类图书 · 🚀/lgli/upload/zlib · Save
描述
“There's an incredible amount of depth and thinking in the practices described here, and it's impressive to see it all in one place.” —Win Treese, coauthor of Designing Systems for Internet Commerce The Practice of Cloud System Administration, Volume 2, focuses on “distributed” or “cloud” computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach. Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics: Designing and building modern web and distributed systems Fundamentals of large system design Understand the new software engineering implications of cloud administration Make systems that are resilient to failure and grow and scale dynamically Implement DevOps principles and cultural changes IaaS/PaaS/SaaS and virtual platform selection Operating and running systems using the latest DevOps/SRE strategies Upgrade production systems with zero down-time What and how to automate; how to decide what not to automate On-call best practices that improve uptime Why distributed systems require fundamentally different system administration techniques Identify and resolve resiliency problems before they surprise you Assessing and evaluating your team's operational effectiveness Manage the scientific process of continuous improvement A forty-page, pain-free assessment system you can start using today
备用文件名
lgli/Thomas A. Limoncellii; Strata R. Chalup; Christina J. Hogan & Strata R. Chalup & Christina J. Hogan - The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2 (2014, ).pdf
备用文件名
zlib/Computers/Networking/Thomas A. Limoncellii; Strata R. Chalup; Christina J. Hogan & Strata R. Chalup & Christina J. Hogan/The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2_19311782.pdf
备选标题
Practice of Cloud System Administration, The: Designing and Operating Large Distributed Systems, Volume 2
备选标题
The practice of cloud system administration : designing and operating large distributed systems. Volume 2
备选标题
Practice of Cloud System Administration, The : DevOps and SRE Practices for Web Services, Volume 2
备选作者
Limoncelli, Thomas, Chalup, Strata, Hogan, Christina
备用出版商
Addison-Wesley Longman, Incorporated
备用出版商
Longman Publishing
备用出版商
Adobe Press
备用版本
Pearson Education (US), Upper Saddle River, NJ, 2015
备用版本
United States, United States of America
元数据中的注释
producers:
iText 2.1.7 by 1T3XT; modified using iText® 5.5.6 ©2000-2015 iText Group NV (AGPL-version)
元数据中的注释
Source title: Practice of Cloud System Administration, The: Designing and Operating Large Distributed Systems, Volume 2
备用描述
Cover 1
Title Page 4
Copyright Page 5
Contents 8
Preface 24
About the Authors 30
Introduction 32
Part I: Design: Building It 38
1 Designing in a Distributed World 40
1.1 Visibility at Scale 41
1.2 The Importance of Simplicity 42
1.3 Composition 43
1.4 Distributed State 48
1.5 The CAP Principle 52
1.6 Loosely Coupled Systems 55
1.7 Speed 57
1.8 Summary 60
Exercises 61
2 Designing for Operations 62
2.1 Operational Requirements 62
2.2 Implementing Design for Operations 76
2.3 Improving the Model 79
2.4 Summary 80
Exercises 81
3 Selecting a Service Platform 82
3.1 Level of Service Abstraction 83
3.2 Type of Machine 87
3.3 Level of Resource Sharing 93
3.4 Colocation 96
3.5 Selection Strategies 97
3.6 Summary 99
Exercises 99
4 Application Architectures 100
4.1 Single-Machine Web Server 101
4.2 Three-Tier Web Service 102
4.3 Four-Tier Web Service 108
4.4 Reverse Proxy Service 111
4.5 Cloud-Scale Service 111
4.6 Message Bus Architectures 116
4.7 Service-Oriented Architecture 121
4.8 Summary 123
Exercises 124
5 Design Patterns for Scaling 126
5.1 General Strategy 127
5.2 Scaling Up 129
5.3 The AKF Scaling Cube 130
5.4 Caching 135
5.5 Data Sharding 141
5.6 Threading 143
5.7 Queueing 144
5.8 Content Delivery Networks 145
5.9 Summary 147
Exercises 147
6 Design Patterns for Resiliency 150
6.1 Software Resiliency Beats Hardware Reliability 151
6.2 Everything Malfunctions Eventually 152
6.3 Resiliency through Spare Capacity 155
6.4 Failure Domains 157
6.5 Software Failures 159
6.6 Physical Failures 162
6.7 Overload Failures 169
6.8 Human Error 172
6.9 Summary 173
Exercises 174
Part II: Operations: Running It 176
7 Operations in a Distributed World 178
7.1 Distributed Systems Operations 179
7.2 Service Life Cycle 186
7.3 Organizing Strategy for Operational Teams 191
7.4 Virtual Office 197
7.5 Summary 198
Exercises 199
8 DevOps Culture 202
8.1 What Is DevOps? 203
8.2 The Three Ways of DevOps 207
8.3 History of DevOps 211
8.4 DevOps Values and Principles 212
8.5 Converting to DevOps 217
8.6 Agile and Continuous Delivery 219
8.7 Summary 223
Exercises 224
9 Service Delivery: The Build Phase 226
9.1 Service Delivery Strategies 228
9.2 The Virtuous Cycle of Quality 231
9.3 Build-Phase Steps 233
9.4 Build Console 236
9.5 Continuous Integration 236
9.6 Packages as Handoff Interface 238
9.7 Summary 239
Exercises 240
10 Service Delivery: The Deployment Phase 242
10.1 Deployment-Phase Steps 242
10.2 Testing and Approval 245
10.3 Operations Console 248
10.4 Infrastructure Automation Strategies 248
10.5 Continuous Delivery 252
10.6 Infrastructure as Code 252
10.7 Other Platform Services 253
10.8 Summary 253
Exercises 254
11 Upgrading Live Services 256
11.1 Taking the Service Down for Upgrading 256
11.2 Rolling Upgrades 257
11.3 Canary 258
11.4 Phased Roll-outs 260
11.5 Proportional Shedding 261
11.6 Blue-Green Deployment 261
11.7 Toggling Features 261
11.8 Live Schema Changes 265
11.9 Live Code Changes 267
11.10 Continuous Deployment 267
11.11 Dealing with Failed Code Pushes 270
11.12 Release Atomicity 271
11.13 Summary 272
Exercises 272
12 Automation 274
12.1 Approaches to Automation 275
12.2 Tool Building versus Automation 281
12.3 Goals of Automation 283
12.4 Creating Automation 286
12.5 How to Automate 289
12.6 Language Tools 289
12.7 Software Engineering Tools and Techniques 293
12.8 Multitenant Systems 301
12.9 Summary 302
Exercises 303
13 Design Documents 306
13.1 Design Documents Overview 306
13.2 Design Document Anatomy 308
13.3 Template 310
13.4 Document Archive 310
13.5 Review Workflows 311
13.6 Adopting Design Documents 313
13.7 Summary 314
Exercises 315
14 Oncall 316
14.1 Designing Oncall 316
14.2 Being Oncall 325
14.3 Between Oncall Shifts 330
14.4 Periodic Review of Alerts 333
14.5 Being Paged Too Much 335
14.6 Summary 336
Exercises 337
15 Disaster Preparedness 338
15.1 Mindset 339
15.2 Individual Training: Wheel of Misfortune 342
15.3 Team Training: Fire Drills 343
15.4 Training for Organizations: Game Day/DiRT 346
15.5 Incident Command System 354
15.6 Summary 360
Exercises 361
16 Monitoring Fundamentals 362
16.1 Overview 363
16.2 Consumers of Monitoring Information 365
16.3 What to Monitor 367
16.4 Retention 369
16.5 Meta-monitoring 370
16.6 Logs 371
16.7 Summary 373
Exercises 373
17 Monitoring Architecture and Practice 376
17.1 Sensing and Measurement 377
17.2 Collection 381
17.3 Analysis and Computation 384
17.4 Alerting and Escalation Manager 385
17.5 Visualization 389
17.6 Storage 393
17.7 Configuration 393
17.8 Summary 394
Exercises 395
18 Capacity Planning 396
18.1 Standard Capacity Planning 397
18.2 Advanced Capacity Planning 402
18.3 Resource Regression 412
18.4 Launching New Services 413
18.5 Reduce Provisioning Time 415
18.6 Summary 416
Exercises 417
19 Creating KPIs 418
19.1 What Is a KPI? 419
19.2 Creating KPIs 420
19.3 Example KPI: Machine Allocation 424
19.4 Case Study: Error Budget 427
19.5 Summary 430
Exercises 430
20 Operational Excellence 432
20.1 What Does Operational Excellence Look Like? 432
20.2 How to Measure Greatness 433
20.3 Assessment Methodology 434
20.4 Service Assessments 438
20.5 Organizational Assessments 442
20.6 Levels of Improvement 443
20.7 Getting Started 444
20.8 Summary 445
Exercises 446
Epilogue 448
Part III: Appendices 450
A: Assessments 452
A.1 Regular Tasks (RT) 454
A.2 Emergency Response (ER) 457
A.3 Monitoring and Metrics (MM) 459
A.4 Capacity Planning (CP) 462
A.5 Change Management (CM) 464
A.6 New Product Introduction and Removal (NPI/NPR) 466
A.7 Service Deployment and Decommissioning (SDD) 468
A.8 Performance and Efficiency (PE) 470
A.9 Service Delivery: The Build Phase 473
A.10 Service Delivery: The Deployment Phase 475
A.11 Toil Reduction 477
A.12 Disaster Preparedness 479
B: The Origins and Future of Distributed Computing and Clouds 482
B.1 The Pre-Web Era (1985–1994) 483
Availability Requirements 483
Technology 484
Scaling 485
High Availability 485
Costs 485
B.2 The First Web Era: The Bubble (1995–2000) 486
Availability Requirements 486
Technology 486
Scaling 487
High Availability 488
Costs 490
B.3 The Dot-Bomb Era (2000–2003) 490
Availability Requirements 491
Technology 491
High Availability 492
Scaling 493
Costs 495
B.4 The Second Web Era (2003–2010) 496
Availability Requirements 496
Technology 496
High Availability 497
Scaling 498
Costs 499
B.5 The Cloud Computing Era (2010–present) 500
Availability Requirements 500
Costs 500
Scaling and High Availability 502
Technology 503
B.6 Conclusion 503
Exercises 504
C: Scaling Terminology and Concepts 506
C.1 Constant, Linear, and Exponential Scaling 506
C.2 Big O Notation 507
C.3 Limitations of Big O Notation 509
D: Templates and Examples 512
D.1 Design Document Template 512
D.2 Design Document Example 513
D.3 Sample Postmortem Template 515
E: Recommended Reading 518
Bibliography 522
Index 530
A 530
B 532
C 533
D 535
E 538
F 539
G 539
H 540
I 541
J 541
K 542
L 542
M 543
N 544
O 545
P 546
Q 548
R 548
S 549
T 552
U 554
V 554
W 554
X 555
Y 555
Z 555
备用描述
"The Practice of Cloud System Administration, Volume 2, focuses on 'distributed' or 'cloud' computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach. Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics: Designing and building modern web and distributed systems: Fundamentals of large system design; Understand the new software engineering implications of cloud administration; Make systems that are resilient to failure and grow and scale dynamically; Implement DevOps principles and cultural changes; IaaS/PaaS/SaaS and virtual platform selection; Operating and running systems using the latest DevOps/SRE strategies: Upgrade production systems with zero down-time; What and how to automate, how to decide what not to automate; On-call best practices that improve uptime; Why distributed systems require fundamentally different system administration techniques; Identify and resolve resiliency problems before they surprise you; Assessing and evaluating your team's operational effectiveness; Manage the scientific process of continuous improvement; A forty-page, pain-free assessment system you can start using today"--Publisher's description
备用描述
Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan. Includes bibliographical references and index.
开源日期
2022-03-08
更多信息……

🚀 快速下载

成为会员以支持书籍、论文等的长期保存。为了感谢您对我们的支持,您将获得高速下载权益。❤️
如果您在本月捐款,您将获得双倍的快速下载次数。

🐢 低速下载

由可信的合作方提供。 更多信息请参见常见问题解答。 (可能需要验证浏览器——无限次下载!)

所有选项下载的文件都相同,应该可以安全使用。即使这样,从互联网下载文件时始终要小心。例如,确保您的设备更新及时。
  • 对于大文件,我们建议使用下载管理器以防止中断。
    推荐的下载管理器:JDownloader
  • 您将需要一个电子书或 PDF 阅读器来打开文件,具体取决于文件格式。
    推荐的电子书阅读器:Anna的档案在线查看器ReadEraCalibre
  • 使用在线工具进行格式转换。
    推荐的转换工具:CloudConvertPrintFriendly
  • 您可以将 PDF 和 EPUB 文件发送到您的 Kindle 或 Kobo 电子阅读器。
    推荐的工具:亚马逊的“发送到 Kindle”djazz 的“发送到 Kobo/Kindle”
  • 支持作者和图书馆
    ✍️ 如果您喜欢这个并且能够负担得起,请考虑购买原版,或直接支持作者。
    📚 如果您当地的图书馆有这本书,请考虑在那里免费借阅。