6 Workflow: scripts and projects

This chapter will introduce you to two essential tools for organizing your code: scripts and projects.
本章将向你介绍两个组织代码的基本工具：脚本和项目。

6.1 Scripts

So far, you have used the console to run code. That’s a great place to start, but you’ll find it gets cramped pretty quickly as you create more complex ggplot2 graphics and longer dplyr pipelines. To give yourself more room to work, use the script editor. Open it up by clicking the File menu, selecting New File, then R script, or using the keyboard shortcut Cmd/Ctrl + Shift + N. Now you’ll see four panes, as in Figure 6.1. The script editor is a great place to experiment with your code. When you want to change something, you don’t have to re-type the whole thing, you can just edit the script and re-run it. And once you have written code that works and does what you want, you can save it as a script file to easily return to later.
到目前为止，你一直在使用控制台 (console) 运行代码。这是一个很好的起点，但当你创建更复杂的 ggplot2 图形和更长的 dplyr 管道时，你会发现它很快就会变得拥挤不堪。为了给自己更多的工作空间，请使用脚本编辑器。通过点击文件菜单，选择新建文件，然后选择 R 脚本来打开它，或者使用键盘快捷键 Cmd/Ctrl + Shift + N。现在你会看到四个窗格，如 Figure 6.1 所示。脚本编辑器是试验代码的绝佳场所。当你想改变某些东西时，你不必重新输入全部内容，只需编辑脚本并重新运行即可。而且，一旦你编写了能正常工作并实现你想要的功能的代码，你可以将其保存为脚本文件，以便日后轻松返回。

RStudio IDE with Editor, Console, and Output highlighted. — Figure 6.1: Opening the script editor adds a new pane at the top-left of the IDE.

6.1.1 Running code

The script editor is an excellent place for building complex ggplot2 plots or long sequences of dplyr manipulations. The key to using the script editor effectively is to memorize one of the most important keyboard shortcuts: Cmd/Ctrl + Enter. This executes the current R expression in the console. For example, take the code below.
脚本编辑器是构建复杂 ggplot2 图或长序列 dplyr 操作的绝佳场所。有效使用脚本编辑器的关键是记住一个最重要的键盘快捷键：Cmd/Ctrl + Enter。这会在控制台中执行当前的 R 表达式。例如，看下面的代码。

library(dplyr)
library(nycflights13)

not_cancelled <- flights |> 
  filter(!is.na(dep_delay)█, !is.na(arr_delay))

not_cancelled |> 
  group_by(year, month, day) |> 
  summarize(mean = mean(dep_delay))

If your cursor is at █, pressing Cmd/Ctrl + Enter will run the complete command that generates not_cancelled. It will also move the cursor to the following statement (beginning with not_cancelled |>). That makes it easy to step through your complete script by repeatedly pressing Cmd/Ctrl + Enter.
如果你的光标在 █ 处，按下 Cmd/Ctrl + Enter 将运行生成 not_cancelled 的完整命令。它还会将光标移动到下一个语句（以 not_cancelled |> 开头）。这样，通过重复按 Cmd/Ctrl + Enter，你就可以轻松地逐步执行整个脚本。

Instead of running your code expression-by-expression, you can also execute the complete script in one step with Cmd/Ctrl + Shift + S. Doing this regularly is a great way to ensure that you’ve captured all the important parts of your code in the script.
除了逐个表达式运行代码，你还可以通过 Cmd/Ctrl + Shift + S 一步执行整个脚本。定期这样做是确保你已将所有重要代码部分保存在脚本中的好方法。

We recommend you always start your script with the packages you need. That way, if you share your code with others, they can easily see which packages they need to install. Note, however, that you should never include install.packages() in a script you share. It’s inconsiderate to hand off a script that will change something on their computer if they’re not being careful!
我们建议你始终在脚本的开头声明所需的包。这样，如果你与他人共享代码，他们可以轻松看到需要安装哪些包。但是请注意，你永远不应该在你分享的脚本中包含 install.packages()。如果别人不小心，你的脚本可能会在他们的计算机上做出更改，递交这样的脚本是不体贴的行为！

When working through future chapters, we highly recommend starting in the script editor and practicing your keyboard shortcuts. Over time, sending code to the console in this way will become so natural that you won’t even think about it.
在学习后续章节时，我们强烈建议你从脚本编辑器开始，并练习你的键盘快捷键。随着时间的推移，以这种方式向控制台发送代码将变得如此自然，以至于你甚至不会去想它。

6.1.2 RStudio diagnostics

In the script editor, RStudio will highlight syntax errors with a red squiggly line and a cross in the sidebar:
在脚本编辑器中，RStudio 会用红色波浪线和侧边栏中的叉号来高亮显示语法错误：

Script editor with the script x y <- 10. A red X indicates that there is syntax error. The syntax error is also highlighted with a red squiggly line.

Hover over the cross to see what the problem is:
将鼠标悬停在叉号上查看问题所在：

Script editor with the script x y <- 10. A red X indicates that there is syntax error. The syntax error is also highlighted with a red squiggly line. Hovering over the X shows a text box with the text unexpected token y and unexpected token <-.

RStudio will also let you know about potential problems:
RStudio 也会让你知道潜在的问题：

Script editor with the script 3 == NA. A yellow exclamation mark indicates that there may be a potential problem. Hovering over the exclamation mark shows a text box with the text use is.na to check whether expression evaluates to NA.

6.1.3 Saving and naming

RStudio automatically saves the contents of the script editor when you quit, and automatically reloads it when you re-open. Nevertheless, it’s a good idea to avoid Untitled1, Untitled2, Untitled3, and so on and instead save your scripts and to give them informative names.
当你退出 RStudio 时，它会自动保存脚本编辑器的内容，并在你重新打开时自动重新加载。尽管如此，最好还是避免使用 Untitled1、Untitled2、Untitled3 等名称，而是保存你的脚本并给它们起一些信息丰富的名字。

It might be tempting to name your files code.R or myscript.R, but you should think a bit harder before choosing a name for your file. Three important principles for file naming are as follows:
你可能会想把你的文件命名为 code.R 或 myscript.R，但在选择文件名之前，你应该再多想一想。文件命名的三个重要原则如下：

File names should be machine readable: avoid spaces, symbols, and special characters. Don’t rely on case sensitivity to distinguish files.
文件名应便于机器读取：避免使用空格、符号和特殊字符。不要依赖大小写区分文件。
File names should be human readable: use file names to describe what’s in the file.
文件名应便于人类阅读：用文件名描述文件内容。
File names should play well with default ordering: start file names with numbers so that alphabetical sorting puts them in the order they get used.
文件名应便于默认排序：用数字开头，这样按字母排序时文件会按使用顺序排列。

For example, suppose you have the following files in a project folder.
例如，假设你的项目文件夹中有以下文件。

alternative model.R
code for exploratory analysis.r
finalreport.qmd
FinalReport.qmd
fig 1.png
Figure_02.png
model_first_try.R
run-first.r
temp.txt

There are a variety of problems here: it’s hard to find which file to run first, file names contain spaces, there are two files with the same name but different capitalization (finalreport vs. FinalReport¹), and some names don’t describe their contents (run-first and temp).
这里存在各种问题：很难找到首先要运行哪个文件，文件名包含空格，有两个名称相同但大小写不同的文件（finalreport vs. FinalReport¹），而且有些名称没有描述其内容（run-first 和 temp）。

Here’s a better way of naming and organizing the same set of files:
这里有一种更好的命名和组织同一组文件的方式：

01-load-data.R
02-exploratory-analysis.R
03-model-approach-1.R
04-model-approach-2.R
fig-01.png
fig-02.png
report-2022-03-20.qmd
report-2022-04-02.qmd
report-draft-notes.txt

Numbering the key scripts makes it obvious in which order to run them and a consistent naming scheme makes it easier to see what varies. Additionally, the figures are labelled similarly, the reports are distinguished by dates included in the file names, and temp is renamed to report-draft-notes to better describe its contents. If you have a lot of files in a directory, taking organization one step further and placing different types of files (scripts, figures, etc.) in different directories is recommended.
对关键脚本进行编号可以清楚地表明运行它们的顺序，而一致的命名方案也更容易看出不同之处。此外，图形的标签也类似，报告通过文件名中包含的日期来区分，temp 被重命名为 report-draft-notes 以更好地描述其内容。如果一个目录中有很多文件，建议将组织工作更进一步，将不同类型的文件（脚本、图形等）放在不同的目录中。

6.2 Projects

One day, you will need to quit R, go do something else, and return to your analysis later. One day, you will be working on multiple analyses simultaneously and you want to keep them separate. One day, you will need to bring data from the outside world into R and send numerical results and figures from R back out into the world.
总有一天，你需要退出 R，去做别的事情，然后再回到你的分析中。总有一天，你会同时进行多个分析，并且你希望将它们分开。总有一天，你需要将外部世界的数据导入 R，并将数值结果和图表从 R 导出到外部世界。

To handle these real life situations, you need to make two decisions:
为了处理这些现实生活中的情况，你需要做出两个决定：

What is the source of truth? What will you save as your lasting record of what happened?
事实来源是什么？你会保存什么作为你所做事情的永久记录？
Where does your analysis live?
你的分析存放在哪里？

6.2.1 What is the source of truth?

As a beginner, it’s okay to rely on your current Environment to contain all the objects you have created throughout your analysis. However, to make it easier to work on larger projects or collaborate with others, your source of truth should be the R scripts. With your R scripts (and your data files), you can recreate the environment. With only your environment, it’s much harder to recreate your R scripts: you’ll either have to retype a lot of code from memory (inevitably making mistakes along the way) or you’ll have to carefully mine your R history.
作为初学者，依赖你当前的环境 (Environment) 来包含你在整个分析过程中创建的所有对象是可以的。然而，为了更容易地处理大型项目或与他人协作，你的事实来源应该是 R 脚本。有了你的 R 脚本（和你的数据文件），你就可以重现环境。如果只有你的环境，要重现你的 R 脚本就困难得多：你要么必须凭记忆重新输入大量代码（这不可避免地会出错），要么就得仔细挖掘你的 R 历史记录。

To help keep your R scripts as the source of truth for your analysis, we highly recommend that you instruct RStudio not to preserve your workspace between sessions. You can do this either by running usethis::use_blank_slate()² or by mimicking the options shown in Figure 6.2. This will cause you some short-term pain, because now when you restart RStudio, it will no longer remember the code that you ran last time nor will the objects you created or the datasets you read be available to use. But this short-term pain saves you long-term agony because it forces you to capture all important procedures in your code. There’s nothing worse than discovering three months after the fact that you’ve only stored the results of an important calculation in your environment, not the calculation itself in your code.
为了帮助保持你的 R 脚本作为你分析的事实来源，我们强烈建议你指示 RStudio 在会话之间不要保存你的工作区。你可以通过运行 usethis::use_blank_slate()¹ 或模仿 Figure 6.2 中显示的选项来做到这一点。这会给你带来一些短期的痛苦，因为现在当你重新启动 RStudio 时，它将不再记得你上次运行的代码，你创建的对象或读取的数据集也无法使用。但这种短期的痛苦可以为你省去长期的折磨，因为它迫使你将所有重要的过程都记录在你的代码中。没有什么比在三个月后发现你只在环境中存储了重要计算的结果，而没有在代码中存储计算本身更糟糕的了。

RStudio Global Options window where the option Restore .RData into workspace at startup is not checked. Also, the option Save workspace to .RData on exit is set to Never. — Figure 6.2: Copy these options in your RStudio options to always start your RStudio session with a clean slate.

There is a great pair of keyboard shortcuts that will work together to make sure you’ve captured the important parts of your code in the editor:
有一对很棒的键盘快捷键可以协同工作，确保你已经将代码的重要部分保存在编辑器中：

Press Cmd/Ctrl + Shift + 0/F10 to restart R.
按下 Cmd/Ctrl + Shift + 0/F10 重启 R。
Press Cmd/Ctrl + Shift + S to re-run the current script.
按下 Cmd/Ctrl + Shift + S 重新运行当前脚本。

We collectively use this pattern hundreds of times a week.
我们每周会集体使用这个模式数百次。

Alternatively, if you don’t use keyboard shortcuts, you can go to Session > Restart R and then highlight and re-run your current script.
或者，如果你不使用键盘快捷键，你可以转到 Session > Restart R，然后高亮并重新运行你当前的脚本。

RStudio server

If you’re using RStudio server, your R session is never restarted by default. When you close your RStudio server tab, it might feel like you’re closing R, but the server actually keeps it running in the background. The next time you return, you’ll be in exactly the same place you left. This makes it even more important to regularly restart R so that you’re starting with a clean slate.
如果你正在使用 RStudio 服务器，你的 R 会话默认情况下永远不会重启。当你关闭 RStudio 服务器的标签页时，可能感觉像是关闭了 R，但服务器实际上在后台保持它运行。下次你回来时，你将正好在你离开的地方。这使得定期重启 R 以便从一个干净的状态开始变得更加重要。

6.2.2 Where does your analysis live?

R has a powerful notion of the working directory. This is where R looks for files that you ask it to load, and where it will put any files that you ask it to save. RStudio shows your current working directory at the top of the console:
R 有一个强大的概念，叫做工作目录 (working directory)。这是 R 寻找你要求它加载的文件的位置，也是它存放你要求它保存的任何文件的位置。 RStudio 在控制台的顶部显示你当前的工作目录：

The Console tab shows the current working directory as ~/Documents/r4ds.

And you can print this out in R code by running getwd():
你也可以在 R 代码中通过运行 getwd() 来打印出这个路径：

getwd()
#> [1] "/Users/hadley/Documents/r4ds"

In this R session, the current working directory (think of it as “home”) is in hadley’s Documents folder, in a subfolder called r4ds. This code will return a different result when you run it, because your computer has a different directory structure than Hadley’s!
在这个 R 会话中，当前的工作目录（可以把它想象成“家”）在 hadley 的 Documents 文件夹下的一个名为 r4ds 的子文件夹中。当你运行这段代码时，会返回一个不同的结果，因为你的计算机目录结构与 Hadley 的不同！

As a beginning R user, it’s OK to let your working directory be your home directory, documents directory, or any other weird directory on your computer. But you’re more than a handful of chapters into this book, and you’re no longer a beginner. Very soon now you should evolve to organizing your projects into directories and, when working on a project, set R’s working directory to the associated directory.
作为 R 的初学者，让你的工作目录是你的主目录、文档目录或你电脑上的任何其他奇怪目录都是可以的。但你已经读了这本书好几章了，你不再是初学者了。很快你就应该进化到将你的项目组织到目录中，并且在处理一个项目时，将 R 的工作目录设置为相关的目录。

You can set the working directory from within R but we do not recommend it:
你可以在 R 内部设置工作目录，但我们不推荐这样做：

setwd("/path/to/my/CoolProject")

There’s a better way; a way that also puts you on the path to managing your R work like an expert. That way is the RStudio project.
有更好的方法；一种能让你像专家一样管理你的 R 工作的途径。那就是 RStudio 项目 (project)。

6.2.3 RStudio projects

Keeping all the files associated with a given project (input data, R scripts, analytical results, and figures) together in one directory is such a wise and common practice that RStudio has built-in support for this via projects. Let’s make a project for you to use while you’re working through the rest of this book. Click File > New Project, then follow the steps shown in Figure 6.3.
将与特定项目相关的所有文件（输入数据、R 脚本、分析结果和图表）都放在一个目录中是一种非常明智和普遍的做法，RStudio 通过项目为这种做法提供了内置支持。让我们为你创建一个项目，以便你在学习本书其余部分时使用。点击 File > New Project，然后按照 Figure 6.3 中显示的步骤操作。

Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop. — Figure 6.3: To create new project: (top) first click New Directory, then (middle) click New Project, then (bottom) fill in the directory (project) name, choose a good subdirectory for its home and click Create Project.

Call your project r4ds and think carefully about which subdirectory you put the project in. If you don’t store it somewhere sensible, it will be hard to find it in the future!
将你的项目命名为 r4ds，并仔细考虑将项目放在哪个子目录中。如果你不把它存放在一个合理的地方，将来会很难找到它！

Once this process is complete, you’ll get a new RStudio project just for this book. Check that the “home” of your project is the current working directory:
一旦这个过程完成，你将为这本书得到一个新的 RStudio 项目。检查一下你项目的“家”是否是当前的工作目录：

getwd()
#> [1] /Users/hadley/Documents/r4ds

Now enter the following commands in the script editor, and save the file, calling it “diamonds.R”. Then, create a new folder called “data”. You can do this by clicking on the “New Folder” button in the Files pane in RStudio. Finally, run the complete script which will save a PNG and CSV file into your project directory. Don’t worry about the details, you’ll learn them later in the book.
现在在脚本编辑器中输入以下命令，并保存文件，命名为“diamonds.R”。然后，创建一个名为“data”的新文件夹。你可以通过点击 RStudio 文件窗格中的“New Folder”按钮来完成此操作。最后，运行整个脚本，这会将一个 PNG 和一个 CSV 文件保存到你的项目目录中。别担心细节，你将在书的后面学习它们。

library(tidyverse)

ggplot(diamonds, aes(x = carat, y = price)) + 
  geom_hex()
ggsave("diamonds.png")

write_csv(diamonds, "data/diamonds.csv")

Quit RStudio. Inspect the folder associated with your project — notice the .Rproj file. Double-click that file to re-open the project. Notice you get back to where you left off: it’s the same working directory and command history, and all the files you were working on are still open. Because you followed our instructions above, you will, however, have a completely fresh environment, guaranteeing that you’re starting with a clean slate.
退出 RStudio。检查与你的项目关联的文件夹——注意那个 .Rproj 文件。双击该文件以重新打开项目。你会发现你回到了离开时的地方：工作目录和命令历史记录都相同，你正在处理的所有文件仍然是打开的。然而，因为你遵循了我们上面的指示，你将拥有一个全新的环境，保证你是从一个干净的状态开始。

In your favorite OS-specific way, search your computer for diamonds.png and you will find the PNG (no surprise) but also the script that created it (diamonds.R). This is a huge win! One day, you will want to remake a figure or just understand where it came from. If you rigorously save figures to files with R code and never with the mouse or the clipboard, you will be able to reproduce old work with ease!
用你喜欢的特定于操作系统的方式，在你的电脑上搜索 diamonds.png，你会找到这个 PNG 文件（不奇怪），但同时也会找到创建它的脚本 (diamonds.R)。这是一个巨大的胜利！总有一天，你会想要重做一个图表，或者只是想了解它是怎么来的。如果你严格地用 R 代码将图表保存到文件，而不是用鼠标或剪贴板，你将能够轻松地重现旧的工作！

6.2.4 Relative and absolute paths

Once you’re inside a project, you should only ever use relative paths not absolute paths. What’s the difference? A relative path is relative to the working directory, i.e. the project’s home. When Hadley wrote data/diamonds.csv above it was a shortcut for /Users/hadley/Documents/r4ds/data/diamonds.csv. But importantly, if Mine ran this code on her computer, it would point to /Users/Mine/Documents/r4ds/data/diamonds.csv. This is why relative paths are important: they’ll work regardless of where the R project folder ends up.
一旦你进入一个项目中，你应该只使用相对路径 (relative paths)，而不是绝对路径 (absolute paths)。有什么区别呢？相对路径是相对于工作目录的，也就是项目的主目录。当 Hadley 在上面写 data/diamonds.csv 时，它是 /Users/hadley/Documents/r4ds/data/diamonds.csv 的一个快捷方式。但重要的是，如果 Mine 在她的电脑上运行这段代码，它将指向 /Users/Mine/Documents/r4ds/data/diamonds.csv。这就是为什么相对路径很重要：无论 R 项目文件夹最终在哪里，它们都能工作。

Absolute paths point to the same place regardless of your working directory. They look a little different depending on your operating system. On Windows they start with a drive letter (e.g., C:) or two backslashes (e.g., \\servername) and on Mac/Linux they start with a slash “/” (e.g., /users/hadley). You should never use absolute paths in your scripts, because they hinder sharing: no one else will have exactly the same directory configuration as you.
无论你的工作目录是什么，绝对路径都指向同一个地方。它们根据你的操作系统看起来有些不同。在 Windows 上，它们以驱动器号（例如 C:）或两个反斜杠（例如 \\servername）开头，而在 Mac/Linux 上，它们以斜杠“/”开头（例如 /users/hadley）。你永远不应该在你的脚本中使用绝对路径，因为它们会妨碍共享：没有其他人会拥有与你完全相同的目录配置。

There’s another important difference between operating systems: how you separate the components of the path. Mac and Linux uses slashes (e.g., data/diamonds.csv) and Windows uses backslashes (e.g., data\diamonds.csv). R can work with either type (no matter what platform you’re currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes! That makes life frustrating, so we recommend always using the Linux/Mac style with forward slashes.
操作系统之间还有另一个重要的区别：你如何分隔路径的组成部分。 Mac 和 Linux 使用正斜杠（例如 data/diamonds.csv），而 Windows 使用反斜杠（例如 data\diamonds.csv）。 R 可以处理这两种类型（无论你当前使用的是哪个平台），但不幸的是，反斜杠对 R 来说有特殊含义，要在路径中得到一个反斜杠，你需要输入两个反斜杠！这让生活变得令人沮丧，所以我们建议始终使用 Linux/Mac 风格的正斜杠。

6.3 Exercises

Go to the RStudio Tips Twitter account, https://twitter.com/rstudiotips and find one tip that looks interesting. Practice using it!
What other common mistakes will RStudio diagnostics report? Read https://support.posit.co/hc/en-us/articles/205753617-Code-Diagnostics to find out.

6.4 Summary

In this chapter, you’ve learned how to organize your R code in scripts (files) and projects (directories). Much like code style, this may feel like busywork at first. But as you accumulate more code across multiple projects, you’ll learn to appreciate how a little up front organisation can save you a bunch of time down the road.
在本章中，你学习了如何在脚本（文件）和项目（目录）中组织你的 R 代码。就像代码风格一样，这起初可能感觉像是琐碎的工作。但随着你在多个项目中积累了越来越多的代码，你会逐渐体会到，一点点前期的组织工作能在未来为你节省大量时间。

In summary, scripts and projects give you a solid workflow that will serve you well in the future:
总而言之，脚本和项目为你提供了一个坚实的工作流程，这将在未来对你大有裨益：

Create one RStudio project for each data analysis project.
为每个数据分析项目创建一个 RStudio 项目。
Save your scripts (with informative names) in the project, edit them, run them in bits or as a whole. Restart R frequently to make sure you’ve captured everything in your scripts.
在项目中保存你的脚本（并使用信息丰富的名称），编辑它们，分段或整体运行它们。频繁重启 R 以确保你已将所有内容都记录在脚本中。
Only ever use relative paths, not absolute paths.
只使用相对路径，不使用绝对路径。

Then everything you need is in one place and cleanly separated from all the other projects that you are working on.
然后，你需要的一切都在一个地方，并与你正在进行的所有其他项目清晰地分离开来。

So far, we’ve worked with datasets bundled inside of R packages. This makes it easier to get some practice on pre-prepared data, but obviously your data won’t be available in this way. So in the next chapter, you’re going to learn how load data from disk into your R session using the readr package.
到目前为止，我们一直使用 R 包中捆绑的数据集。这使得在预先准备好的数据上进行练习变得更容易，但显然你的数据不会以这种方式提供。因此，在下一章中，你将学习如何使用 readr 包将数据从磁盘加载到你的 R 会话中。

Not to mention that you’re tempting fate by using “final” in the name 😆 The comic Piled Higher and Deeper has a fun strip on this.↩︎
If you don’t have usethis installed, you can install it with install.packages("usethis").↩︎