0. The Basics of R: Starting-up
Introduction
What is R? What is RStudio?
'R' is a programming language but is also used to refer to the software that interprets R-code (formally this software is called the 'R-interpreter').
RStudio is an application like Microsoft Word — except that instead of helping you write in English, RStudio helps you write in R (software like RStudio that helps to write computer code is called an Integrated Development Environment, IDE). To function correctly, RStudio needs R and therefore both need to be installed on your computer.
Why learn R?
R does not involve lots of pointing and clicking, but instead on a series of written commands. For this reason, the learning curve might be steeper than some other data analysis software. However, on the positive side, the results of your analyses in R do not rely on remembering a succession of pointing and clicking. That’s a good thing, because if you want to redo your analysis you don’t have to remember which button you clicked in which order to obtain your results; you just have to run your script again.
Working with scripts makes the steps you used in your analysis clear, and the code you write can be inspected by someone else who can give you feedback and spot mistakes.
Working with scripts forces you to have a deeper understanding of what you are doing, and facilitates your learning and comprehension of the methods you use.
R code is great for reproducibility
Reproducibility is when someone else (including your future self) can obtain the same results from the same dataset when using the same analysis.
R integrates with other tools to generate manuscripts from your code. If you collect more data, or fix a mistake in your dataset, the figures and the statistical tests in your manuscript are updated automatically.
Reproducibility is a big issue in all places where data is processed and analysed (not just in science - also in industry & government). So knowing R will give you an edge in meeting the demands to deliver reproducible analyses.
R is interdisciplinary and extensible
With 10,000+ packages that can be installed to extend its capabilities, R provides a framework that allows you to combine statistical approaches from any thinkable discipline to best suit the analytical framework you need to analyze your data. For instance, R has packages for image analysis, GIS, time series, population genetics, text analysis, and a lot more.
R works on data of all shapes and sizes so regardless the type of data you have to handle - it can be loaded and processed. The skills you learn with R scale easily with the size of your dataset. Whether your dataset consists of 10 values, hundreds or millions of lines, it won’t make much difference to you.
R is designed for data analysis. It comes with special data structures and data types that make handling of missing data and statistical factors convenient.
R can connect to spreadsheets, databases, and many other data formats, on your computer or on the web.
R produces high-quality graphics
The plotting functionalities in R are endless, and allow you to adjust any aspect of your graph to convey most effectively the message from your data.
R has a large and welcoming community
Tens of thousands of people use R daily. Many of them are willing to help you through mailing lists and websites such as Stack Overflow, or on the RStudio community.
On top of that, there are countless books and tutorials (two incomplete lists by r-project & Roman Tsegelskyi) which explain the application of R in many application domains.
Not only is R free, but it is also open-source and cross-platform
Anyone can inspect the source code to see how R works. Because of this transparency, there is less chance for mistakes, and if you (or someone else) find some, you can report and fix bugs.
You can interact with R/RStudio in different ways
You can use R and RStudio online (without installing any software) as well as offline (by downloading and installing first R and then RStudio). For this course we ask you to do both: set-up your online account and also install R and RStudio on your computer. The subsequent sub-chapters explain how to do this.
R is going to be around for a long time and will be important in your future study & work
Because the user community is so large and there is so much code and good support for R, it is going to be an important platform in the coming decade. It will be used in many courses during your study and quite likely also be relevant in your future work. In case you will work with another programming language in the future, knowing R already will prove to be a great benefit: programming languages and IDEs all work in a similar way and knowing one will make it easy to quickly pick-up another.
Points to keep in mind
-
R and RStudio are great tools that are used throughout research, government and industry.
-
It is not the language or software per-se which makes it attractive to work with R - it is especially the large, interdisciplinary and open-minded user community which makes it stand out.
- There are many books and tutorials available which will help you master R in case you want to continue learning beyond what is offered in this course.