y <- 1:4
mean(y)
8 Workflow: getting help
This book is not an island; there is no single resource that will allow you to master R. As you begin to apply the techniques described in this book to your own data, you will soon find questions that we do not answer. This section describes a few tips on how to get help and to help you keep learning.
本书不是一座孤岛;没有任何单一资源能让你精通 R。 当你开始将本书中描述的技术应用于你自己的数据时,你很快就会发现我们没有回答的问题。 本节将介绍一些获取帮助以及帮助你持续学习的技巧。
8.1 Google is your friend
If you get stuck, start with Google. Typically adding “R” to a query is enough to restrict it to relevant results: if the search isn’t useful, it often means that there aren’t any R-specific results available. Additionally, adding package names like “tidyverse” or “ggplot2” will help narrow down the results to code that will feel more familiar to you as well, e.g., “how to make a boxplot in R” vs. “how to make a boxplot in R with ggplot2”. Google is particularly useful for error messages. If you get an error message and you have no idea what it means, try googling it! Chances are that someone else has been confused by it in the past, and there will be help somewhere on the web. (If the error message isn’t in English, run Sys.setenv(LANGUAGE = "en")
and re-run the code; you’re more likely to find help for English error messages.)
如果你遇到困难,从 Google 开始。 通常,在查询中添加 “R” 就足以将其限制在相关的结果中:如果搜索结果不理想,这通常意味着没有 R 特定的结果可用。 此外,添加像 “tidyverse” 或 “ggplot2” 这样的包名也会帮助你将结果缩小到你更熟悉的代码,例如,“how to make a boxplot in R” 对比 “how to make a boxplot in R with ggplot2”。 Google 对于错误信息尤其有用。 如果你收到一条错误信息,并且不知道它是什么意思,试试用 Google 搜索它! 很可能过去有其他人也对此感到困惑,网上某个地方会有帮助。 (如果错误信息不是英文的,运行 Sys.setenv(LANGUAGE = "en")
并重新运行代码;你更有可能找到英文错误信息的帮助。)
If Google doesn’t help, try Stack Overflow. Start by spending a little time searching for an existing answer, including [R]
, to restrict your search to questions and answers that use R.
如果 Google 帮不了你,试试 Stack Overflow。 首先花点时间搜索现有的答案,记得加上 [R]
,将你的搜索限制在使用 R 的问题和答案上。
8.2 Making a reprex
If your googling doesn’t find anything useful, it’s a really good idea to prepare a reprex, short for minimal reproducible example. A good reprex makes it easier for other people to help you, and often you’ll figure out the problem yourself in the course of making it. There are two parts to creating a reprex:
如果你的 Google 搜索没有找到任何有用的东西,那么准备一个 reprex (最小可复现示例的缩写) 是个非常好的主意。 一个好的 reprex 能让其他人更容易帮助你,而且通常在制作 reprex 的过程中你就能自己找出问题所在。 创建 reprex 有两个部分:
First, you need to make your code reproducible. This means that you need to capture everything, i.e. include any
library()
calls and create all necessary objects. The easiest way to make sure you’ve done this is using the reprex package.首先,你需要让你的代码可复现。 这意味着你需要捕获所有东西,即包含任何
library()
调用并创建所有必要的对象。 确保做到这一点最简单的方法是使用 reprex 包。Second, you need to make it minimal. Strip away everything that is not directly related to your problem. This usually involves creating a much smaller and simpler R object than the one you’re facing in real life or even using built-in data.
其次,你需要让它最小化。 剔除所有与你的问题不直接相关的东西。 这通常涉及创建一个比你现实生活中面临的 R 对象小得多、简单得多的对象,甚至使用内置数据。
That sounds like a lot of work! And it can be, but it has a great payoff:
这听起来工作量很大! 事实可能如此,但它有巨大的回报:
80% of the time, creating an excellent reprex reveals the source of your problem. It’s amazing how often the process of writing up a self-contained and minimal example allows you to answer your own question.
80% 的情况下,创建一个出色的 reprex 会揭示你问题的根源。 令人惊讶的是,编写一个独立的最小示例的过程常常能让你自己回答自己的问题。
The other 20% of the time, you will have captured the essence of your problem in a way that is easy for others to play with. This substantially improves your chances of getting help!
另外 20% 的情况下,你将以一种便于他人操作的方式抓住了问题的本质。 这大大提高了你获得帮助的机会!
When creating a reprex by hand, it’s easy to accidentally miss something, meaning your code can’t be run on someone else’s computer. Avoid this problem by using the reprex package, which is installed as part of the tidyverse. Let’s say you copy this code onto your clipboard (or, on RStudio Server or Cloud, select it):
当手动创建 reprex 时,很容易不小心漏掉某些东西,这意味着你的代码无法在别人的电脑上运行。 通过使用 reprex 包可以避免这个问题,该包是作为 tidyverse 的一部分安装的。 假设你将这段代码复制到剪贴板(或者,在 RStudio Server 或 Cloud 上,选中它):
Then call reprex()
, where the default output is formatted for GitHub:
然后调用 reprex()
,其默认输出格式是为 GitHub 准备的:
reprex::reprex()
A nicely rendered HTML preview will display in RStudio’s Viewer (if you’re in RStudio) or your default browser otherwise. The reprex is automatically copied to your clipboard (on RStudio Server or Cloud, you will need to copy this yourself):
一个渲染精美的 HTML 预览将显示在 RStudio 的 Viewer 窗格中(如果你在 RStudio 中)或者你的默认浏览器中。 reprex 会自动复制到你的剪贴板(在 RStudio Server 或 Cloud 上,你需要自己复制):
``` r
y <- 1:4
mean(y)
#> [1] 2.5
```
This text is formatted in a special way, called Markdown, which can be pasted to sites like StackOverflow or Github and they will automatically render it to look like code. Here’s what that Markdown would look like rendered on GitHub:
这段文本以一种特殊的方式格式化,称为 Markdown,可以粘贴到像 StackOverflow 或 Github 这样的网站上,它们会自动将其渲染成代码的样子。 以下是该 Markdown 在 GitHub 上渲染后的样子:
y <- 1:4
mean(y)
#> [1] 2.5
Anyone else can copy, paste, and run this immediately.
任何其他人都可以立即复制、粘贴并运行这段代码。
There are three things you need to include to make your example reproducible: required packages, data, and code.
要使你的示例可复现,你需要包含三样东西:所需的包、数据和代码。
Packages should be loaded at the top of the script so it’s easy to see which ones the example needs. This is a good time to check that you’re using the latest version of each package; you may have discovered a bug that’s been fixed since you installed or last updated the package. For packages in the tidyverse, the easiest way to check is to run
tidyverse_update()
.
包 (Packages) 应该在脚本的顶部加载,这样可以很容易地看到示例需要哪些包。 这是一个检查你是否正在使用每个包最新版本的好时机;你可能发现了一个在你安装或上次更新包之后已经被修复的 bug。 对于 tidyverse 中的包,最简单的检查方法是运行tidyverse_update()
。-
The easiest way to include data is to use
dput()
to generate the R code needed to recreate it. For example, to recreate themtcars
dataset in R, perform the following steps:
包含 数据 (data) 最简单的方法是使用dput()
生成重现数据所需的 R 代码。 例如,要在 R 中重现mtcars
数据集,请执行以下步骤:- Run
dput(mtcars)
in R - Copy the output
- In reprex, type
mtcars <-
, then paste.
Try to use the smallest subset of your data that still reveals the problem.
尽量使用仍能揭示问题的最小数据子集。 - Run
-
Spend a little bit of time ensuring that your code is easy for others to read:
花一点时间确保你的 代码 (code) 便于他人阅读:Make sure you’ve used spaces and your variable names are concise yet informative.
确保你使用了空格,并且你的变量名简洁而信息丰富。Use comments to indicate where your problem lies.
使用注释来指出你的问题所在。Do your best to remove everything that is not related to the problem.
尽力删除所有与问题无关的内容。
The shorter your code is, the easier it is to understand and the easier it is to fix.
你的代码越短,就越容易理解,也越容易修复。
Finish by checking that you have actually made a reproducible example by starting a fresh R session and copying and pasting your script.
最后,通过启动一个新的 R 会话并复制粘贴你的脚本,来检查你是否确实创建了一个可复现的示例。
Creating reprexes is not trivial, and it will take some practice to learn to create good, truly minimal reprexes. However, learning to ask questions that include the code, and investing the time to make it reproducible will continue to pay off as you learn and master R.
创建 reprex 并非易事,需要一些练习才能学会创建好的、真正最小化的 reprex。 然而,学会提出包含代码的问题,并投入时间使其可复现,将在你学习和掌握 R 的过程中持续带来回报。
8.3 Investing in yourself
You should also spend some time preparing yourself to solve problems before they occur. Investing a little time in learning R each day will pay off handsomely in the long run. One way is to follow what the tidyverse team is doing on the tidyverse blog. To keep up with the R community more broadly, we recommend reading R Weekly: it’s a community effort to aggregate the most interesting news in the R community each week.
你也应该花一些时间为自己做好准备,以便在问题发生前就能解决它们。 每天投入一点时间学习 R,从长远来看将获得丰厚的回报。 一种方法是在 tidyverse 博客 上关注 tidyverse 团队的动态。 为了更广泛地了解 R 社区的动态,我们推荐阅读 R Weekly:这是一个社区项目,每周汇总 R 社区最有趣的新闻。
8.4 Summary
This chapter concludes the Whole Game part of the book. You’ve now seen the most important parts of the data science process: visualization, transformation, tidying and importing. Now you’ve got a holistic view of the whole process, and we start to get into the details of small pieces.
本章结束了本书的“全局概览”部分。 你现在已经看到了数据科学流程中最重要的部分:可视化、转换、整理和导入。 现在你对整个流程有了全面的了解,我们将开始深入探讨各个小部分的细节。
The next part of the book, Visualize, does a deeper dive into the grammar of graphics and creating data visualizations with ggplot2, showcases how to use the tools you’ve learned so far to conduct exploratory data analysis, and introduces good practices for creating plots for communication.
本书的下一部分“可视化”,将更深入地探讨图形语法和使用 ggplot2 创建数据可视化,展示如何使用你目前学到的工具进行探索性数据分析,并介绍创建用于交流的图表的良好实践。