Preface to the second edition
Welcome to the second edition of “R for Data Science”!
欢迎阅读 《R for Data Science》 第二版!
This is a major reworking of the first edition, removing material we no longer think is useful, adding material we wish we included in the first edition, and generally updating the text and code to reflect changes in best practices.
这是对第一版的一次重大重构:剔除了不再实用的内容,补充了当初想加入却未能包含的材料,并全面更新了文本和代码,以反映最新的最佳实践。
We’re also very excited to welcome a new co-author: Mine Çetinkaya-Rundel, a noted data-science educator and one of our colleagues at Posit (the company formerly known as RStudio).
我们还非常高兴地迎来新合著者 —— 著名的数据科学教育者、Posit(前身为 RStudio)同事 Mine Çetinkaya-Rundel。
A brief summary of the biggest changes follows:
以下简要概述本次最重要的改动:
The first part of the book has been renamed to “Whole game”.
The goal of this section is to give you the rough details of the “whole game” of data science before we dive into the details.
书的第一部分现已更名为 Whole game(完整全局)。本节旨在让你在深入细节之前,先对数据科学的 “完整游戏” 有一个整体了解。The second part of the book is “Visualize”.
This part gives data-visualization tools and best practices a more thorough coverage compared to the first edition.
The best place to get all the details is still the ggplot2 book, but now R4DS covers more of the most important techniques.
第二部分为 Visualize(可视化)。相比第一版,本版更系统地介绍数据可视化工具与最佳实践。详细信息仍以 《ggplot2》 一书为最佳参考,但 R4DS 现已涵盖更多关键方法。The third part of the book is now called “Transform” and gains new chapters on numbers, logical vectors, and missing values.
These were previously parts of the data-transformation chapter, but needed much more room to cover all the details.
第三部分现称为 Transform(转换),并新增了数字、逻辑向量和缺失值等章节。它们原先只是数据转换章节中的一部分,但如今需要更多篇幅深入介绍。The fourth part of the book is called “Import”.
It’s a new set of chapters that goes beyond reading flat text files to working with spreadsheets, getting data out of databases, working with big data, rectangling hierarchical data, and scraping data from web sites.
第四部分为 Import(导入)。这一新篇章不仅涉及读取纯文本文件,还涵盖电子表格、数据库提取、大数据处理、层级数据整形,以及网页数据抓取等主题。The “Program” part remains, but has been rewritten from top-to-bottom to focus on the most important parts of function writing and iteration.
Function writing now includes details on how to wrap tidyverse functions (dealing with the challenges of tidy evaluation), since this has become much easier and more important over the last few years.
We’ve added a new chapter on important base R functions that you’re likely to see in wild-caught R code.
“Program” 部分保留,但已彻底重写,重点聚焦函数编写与迭代的核心要素。现在的函数编写章节详细讲解了如何封装 tidyverse 函数(应对 tidy evaluation 的挑战),因为近年来这项任务变得更简单且更关键。我们还新增了讲解常见 base R 函数的新章节,以帮助你读懂“野生” R 代码。The modeling part has been removed.
We never had enough room to fully do modelling justice, and there are now much better resources available.
We generally recommend using the tidymodels packages and reading Tidy Modeling with R by Max Kuhn and Julia Silge.
模型部分已被移除。由于篇幅所限,我们无法充分展示建模的全部内容,而如今已有更优秀的资源可供学习。我们推荐使用 tidymodels 套件,并阅读 Max Kuhn 与 Julia Silge 合著的 Tidy Modeling with R。The “Communicate” part remains, but has been thoroughly updated to feature Quarto instead of R Markdown.
This edition of the book has been written in Quarto, and it’s clearly the tool of the future.
“Communicate” 部分仍在,但已全面更新为 Quarto,替代 R Markdown。本书第二版即使用 Quarto 编写,显然它将成为未来的首选工具。