2  Workflow: basics

You now have some experience running R code. We didn’t give you many details, but you’ve obviously figured out the basics, or you would’ve thrown this book away in frustration! Frustration is natural when you start programming in R because it is such a stickler for punctuation, and even one character out of place can cause it to complain. But while you should expect to be a little frustrated, take comfort in that this experience is typical and temporary: it happens to everyone, and the only way to get over it is to keep trying.
你现在已经有了一些运行 R 代码的经验。我们没有提供太多细节,但你显然已经掌握了基础知识,否则你可能早就沮丧地扔掉这本书了!当你开始用 R 编程时,感到沮丧是正常的,因为它对标点符号要求非常严格,哪怕只有一个字符放错位置 (misplaced) 也会导致它报错。但是,尽管你应该预料到会有些沮丧,但请放心,这种经历是典型且暂时的:每个人都会遇到,而克服它的唯一方法就是不断尝试。

Before we go any further, let’s ensure you’ve got a solid foundation in running R code and that you know some of the most helpful RStudio features.
在我们继续深入之前,让我们确保你在运行 R 代码方面有坚实的基础,并且了解一些 RStudio 最有用的功能。

2.1 Coding basics

Let’s review some basics we’ve omitted so far in the interest of getting you plotting as quickly as possible. You can use R to do basic math calculations:
让我们回顾一些为了让你尽快开始绘图而省略的基础知识。你可以使用 R 进行基本的数学计算:

1 / 200 * 30
#> [1] 0.15
(59 + 73 + 2) / 3
#> [1] 44.66667
sin(pi / 2)
#> [1] 1

You can create new objects with the assignment operator <-:
你可以使用赋值运算符 <- 创建新对象:

x <- 3 * 4

Note that the value of x is not printed, it’s just stored. If you want to view the value, type x in the console.
注意,x 的值并不会被打印出来,它只是被储存起来了。如果你想查看它的值,可以在控制台 (console) 中输入 x

You can combine multiple elements into a vector with c():
你可以使用 c() 函数将多个元素 combine (组合) 成一个向量 (vector):

primes <- c(2, 3, 5, 7, 11, 13)

And basic arithmetic on vectors is applied to every element of the vector:
对向量进行的基本算术运算会应用于向量的每个元素:

primes * 2
#> [1]  4  6 10 14 22 26
primes - 1
#> [1]  1  2  4  6 10 12

All R statements where you create objects, assignment statements, have the same form:
所有创建对象的 R 语句,即赋值 (assignment) 语句,都具有相同的形式:

object_name <- value

When reading that code, say “object name gets value” in your head.
在阅读这段代码时,你可以在脑海中默念“对象名得到值”。

You will make lots of assignments, and <- is a pain to type. You can save time with RStudio’s keyboard shortcut: Alt + - (the minus sign). Notice that RStudio automatically surrounds <- with spaces, which is a good code formatting practice. Code can be miserable to read on a good day, so giveyoureyesabreak and use spaces.
你会进行大量的赋值操作,而手动输入 <- 会很麻烦。你可以使用 RStudio 的键盘快捷键来节省时间:Alt + - (减号)。请注意,RStudio 会自动在 <- 两侧添加空格,这是一个很好的代码格式化习惯。即使在状态好的时候,阅读代码也可能是一件痛苦的事情,所以请善待你的眼睛,多使用空格。

2.2 Comments

R will ignore any text after # for that line. This allows you to write comments, text that is ignored by R but read by other humans. We’ll sometimes include comments in examples explaining what’s happening with the code.
R 会忽略该行 # 之后的所有文本。这允许你编写注释 (comments),即被 R 忽略但供人类阅读的文本。我们有时会在示例中加入注释来解释代码的功能。

Comments can be helpful for briefly describing what the following code does.
注释有助于简要描述后续代码的功能。

# create vector of primes
primes <- c(2, 3, 5, 7, 11, 13)

# multiply primes by 2
primes * 2
#> [1]  4  6 10 14 22 26

With short pieces of code like this, leaving a comment for every single line of code might not be necessary. But as the code you’re writing gets more complex, comments can save you (and your collaborators) a lot of time figuring out what was done in the code.
对于像这样简短的代码片段,可能没有必要为每一行代码都写注释。但是,当你编写的代码变得越来越复杂时,注释可以为你(和你的合作者)节省大量时间来理解代码的功能。

Use comments to explain the why of your code, not the how or the what. The what and how of your code are always possible to figure out, even if it might be tedious, by carefully reading it. If you describe every step in the comments, and then change the code, you will have to remember to update the comments as well or it will be confusing when you return to your code in the future.
使用注释来解释你代码的原因 (why),而不是方式 (how) 或内容 (what)。通过仔细阅读代码,总是可以搞清楚代码的内容方式,尽管这可能很乏味。如果你在注释中描述了每一步,然后在修改代码后,你必须记得同时更新注释,否则将来你再回头看代码时会感到困惑。

Figuring out why something was done is much more difficult, if not impossible. For example, geom_smooth() has an argument called span, which controls the smoothness of the curve, with larger values yielding a smoother curve. Suppose you decide to change the value of span from its default of 0.75 to 0.9: it’s easy for a future reader to understand what is happening, but unless you note your thinking in a comment, no one will understand why you changed the default.
搞清楚做某件事的原因要困难得多,甚至是不可能的。例如,geom_smooth() 有一个名为 span 的参数,它控制曲线的平滑度,值越大曲线越平滑。假设你决定将 span 的值从默认的 0.75 更改为 0.9:未来的读者很容易理解发生了什么,但除非你在注释中记录下你的想法,否则没人会明白你为什么要更改默认值。

For data analysis code, use comments to explain your overall plan of attack and record important insights as you encounter them. There’s no way to re-capture this knowledge from the code itself.
对于数据分析代码,使用注释来解释你的整体分析计划,并记录你遇到的重要见解。这些信息是无法单从代码本身重新获取的。

2.3 What’s in a name?

Object names must start with a letter and can only contain letters, numbers, _, and .. You want your object names to be descriptive, so you’ll need to adopt a convention for multiple words. We recommend snake_case, where you separate lowercase words with _.
对象名称必须以字母开头,并且只能包含字母、数字、_.。你希望你的对象名称具有描述性,所以你需要为多个单词的命名采纳一种约定。我们推荐使用蛇形命名法 (snake_case),即用 _ 分隔小写单词。

i_use_snake_case
otherPeopleUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention

We’ll return to names again when we discuss code style in Chapter 4.
我们将在 Chapter 4 中讨论代码风格时再次回到命名的话题。

You can inspect an object by typing its name:
你可以通过输入对象名称来查看它:

x
#> [1] 12

Make another assignment:
再进行一次赋值:

this_is_a_really_long_name <- 2.5

To inspect this object, try out RStudio’s completion facility: type “this”, press TAB, add characters until you have a unique prefix, then press return.
要查看这个对象,可以试试 RStudio 的自动补全功能:输入 “this”,按 TAB 键,继续添加字符直到前缀唯一,然后按回车键。

Let’s assume you made a mistake, and that the value of this_is_a_really_long_name should be 3.5, not 2.5. You can use another keyboard shortcut to help you fix it. For example, you can press ↑ to bring the last command you typed and edit it. Or, type “this” then press Cmd/Ctrl + ↑ to list all the commands you’ve typed that start with those letters. Use the arrow keys to navigate, then press enter to retype the command. Change 2.5 to 3.5 and rerun.
假设你犯了一个错误,this_is_a_really_long_name 的值应该是 3.5,而不是 2.5。你可以使用另一个键盘快捷键来帮助你修正它。例如,你可以按 ↑ 键调出你输入的上一条命令并进行编辑。或者,输入 “this”,然后按 Cmd/Ctrl + ↑ 来列出所有以这些字母开头的命令。使用箭头键导航,然后按回车键重新输入该命令。将 2.5 改为 3.5 并重新运行。

Make yet another assignment:
再进行一次赋值:

r_rocks <- 2^3

Let’s try to inspect it:
让我们试着查看它:

r_rock
#> Error: object 'r_rock' not found
R_rocks
#> Error: object 'R_rocks' not found

This illustrates the implied contract between you and R: R will do the tedious computations for you, but in exchange, you must be completely precise in your instructions. If not, you’re likely to get an error that says the object you’re looking for was not found. Typos matter; R can’t read your mind and say, “oh, they probably meant r_rocks when they typed r_rock”. Case matters; similarly, R can’t read your mind and say, “oh, they probably meant r_rocks when they typed R_rocks”.
这说明了你和 R 之间的一个隐含契约:R 会为你完成繁琐的计算,但作为交换,你必须给出完全精确的指令。否则,你很可能会得到一个错误,提示找不到你想要的对象。拼写错误很重要;R 无法读懂你的心思,然后说:“哦,他们输入 r_rock 时可能指的是 r_rocks”。大小写也很重要;同样,R 也无法读懂你的心思,然后说:“哦,他们输入 R_rocks 时可能指的是 r_rocks”。

2.4 Calling functions

R has a large collection of built-in functions that are called like this:
R 有大量内置函数,调用方式如下:

function_name(argument1 = value1, argument2 = value2, ...)

Let’s try using seq(), which makes regular sequences of numbers, and while we’re at it, learn more helpful features of RStudio. Type se and hit TAB. A popup shows you possible completions. Specify seq() by typing more (a q) to disambiguate or by using ↑/↓ arrows to select. Notice the floating tooltip that pops up, reminding you of the function’s arguments and purpose. If you want more help, press F1 to get all the details in the help tab in the lower right pane.
让我们试试 seq() 函数,它可以生成规则的数字列 (sequences),同时我们也可以学习更多 RStudio 的实用功能。输入 se 然后按 TAB 键。一个弹出窗口会显示可能的补全选项。通过输入更多字符(一个 q)来消除歧义,或使用 ↑/↓ 箭头来选择 seq()。注意弹出的浮动提示框,它会提醒你该函数的参数和用途。如果你需要更多帮助,可以按 F1 键,在右下角的帮助 (help) 标签页中获取所有详细信息。

When you’ve selected the function you want, press TAB again. RStudio will add matching opening (() and closing ()) parentheses for you. Type the name of the first argument, from, and set it equal to 1. Then, type the name of the second argument, to, and set it equal to 10. Finally, hit return.
当你选定想要的函数后,再次按 TAB 键。RStudio 会为你添加匹配的开括号 ( 和闭括号 )。输入第一个参数的名称 from,并将其设置为 1。然后,输入第二个参数的名称 to,并将其设置为 10。最后,按回车键。

seq(from = 1, to = 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10

We often omit the names of the first several arguments in function calls, so we can rewrite this as follows:
在函数调用中,我们经常省略前几个参数的名称,所以我们可以像下面这样重写:

seq(1, 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10

Type the following code and notice that RStudio provides similar assistance with the paired quotation marks:
输入以下代码,你会发现 RStudio 对成对的引号也提供了类似的辅助功能:

x <- "hello world"

Quotation marks and parentheses must always come in a pair. RStudio does its best to help you, but it’s still possible to mess up and end up with a mismatch. If this happens, R will show you the continuation character “+”:
引号和括号必须总是成对出现。RStudio 会尽力帮助你,但仍然有可能出错,导致括号或引号不匹配。如果发生这种情况,R 会显示一个续行符 +

> x <- "hello
+

The + tells you that R is waiting for more input; it doesn’t think you’re done yet. Usually, this means you’ve forgotten either a " or a ). Either add the missing pair, or press ESCAPE to abort the expression and try again.
这个 + 告诉你 R 正在等待更多输入;它认为你还没有完成输入。通常,这意味着你忘记了输入一个 ")。你可以补上缺失的符号,或者按 ESCAPE 键中止表达式并重试。

Note that the environment tab in the upper right pane displays all of the objects that you’ve created:
注意,右上角窗格中的环境 (environment) 标签页会显示你创建的所有对象:

Environment tab of RStudio which shows r_rocks, this_is_a_really_long_name,  x, and y in the Global Environment.

2.5 Exercises

2.6 Exercises

  1. Why does this code not work?

    my_variable <- 10
    my_varıable
    #> Error: object 'my_varıable' not found

    Look carefully! (This may seem like an exercise in pointlessness, but training your brain to notice even the tiniest difference will pay off when programming.)

  2. Tweak each of the following R commands so that they run correctly:

    libary(todyverse)
    
    ggplot(dTA = mpg) + 
      geom_point(maping = aes(x = displ y = hwy)) +
      geom_smooth(method = "lm)
  3. Press Option + Shift + K / Alt + Shift + K. What happens? How can you get to the same place using the menus?

  4. Let’s revisit an exercise from the Section 1.6. Run the following lines of code. Which of the two plots is saved as mpg-plot.png? Why?

    my_bar_plot <- ggplot(mpg, aes(x = class)) +
      geom_bar()
    my_scatter_plot <- ggplot(mpg, aes(x = cty, y = hwy)) +
      geom_point()
    ggsave(filename = "mpg-plot.png", plot = my_bar_plot)

2.7 Summary

Now that you’ve learned a little more about how R code works, and some tips to help you understand your code when you come back to it in the future. In the next chapter, we’ll continue your data science journey by teaching you about dplyr, the tidyverse package that helps you transform data, whether it’s selecting important variables, filtering down to rows of interest, or computing summary statistics.
现在你已经对 R 代码的工作原理有了更多了解,也掌握了一些技巧,可以帮助你在未来回顾代码时更好地理解它。在下一章中,我们将继续你的数据科学之旅,教你关于 dplyr 的知识,它是 tidyverse 中的一个包,可以帮助你转换数据,无论是选择重要变量、筛选感兴趣的行,还是计算汇总统计数据。