Using R on Gautschi cluster¶
Introduction of R¶
R, a GNU project, is a language and environment for data manipulation, statistics, and graphics. It is an open source version of the S programming language. R is quickly becoming the language of choice for data science due to the ease with which it can produce high quality plots and data visualizations. It is a versatile platform with a large, growing community and collection of packages.
For more general information on R visit The R Project for Statistical Computing.
Loading Data into R¶
R is an environment for manipulating data. In order to manipulate data, it must be brought into the R environment. R has a function to read any file that data is stored in. Some of the most common file types like comma-separated variable(CSV) files have functions that come in the basic R packages. Other less common file types require additional packages to be installed. To read data from a CSV file into the R environment, enter the following command in the R prompt:
When R reads the file it creates an object that can then become the target of other functions. By default the read.csv() function will give the object the name of the .csv file. To assign a different name to the object created by read.csv enter the following in the R prompt:
To display the properties (structure) of loaded data, enter the following:
For more functions and tutorials:
Installing R packages¶
Challenges of Managing R Packages in the Cluster Environment¶
- Different clusters have different hardware and softwares. So, if you have access to multiple clusters, you must install your R packages separately for each cluster.
- Each cluster has multiple versions of R and packages installed with one version of R may not work with another version of R. So, libraries for each R version must be installed in a separate directory.
- You can define the directory where your R packages will be installed using the environment variable
R_LIBS_USER. - For your convenience, a sample ~/.Rprofile example file is provided that can be downloaded to your cluster account and renamed into
~/.Rprofile(or appended to one) to customize your installation preferences. Detailed instructions.
Installing Packages¶
-
Step 0: Set up installation preferences.
Follow the steps for setting up your~/.Rprofilepreferences. This step needs to be done only once. If you have created a~/.Rprofilefile previously on Gautschi, ignore this step. -
Step 1: Check if the package is already installed.
As part of the R installations on community clusters, a lot of R libraries are pre-installed. You can check if your package is already installed by opening an R terminal and entering the commandinstalled.packages(). For example,If the package you are trying to use is already installed, simply load the library, e.g.,
library('units'). Otherwise, move to the next step to install the package. -
Step 2: Load required dependencies. (if needed)
For simple packages you may not need this step. However, some R packages depend on other libraries. For example, thesfpackage depends ongdalandgeoslibraries. So, you will need to load the corresponding modules before installingsf. Read the documentation for the package to identify which modules should be loaded. -
Step 3: Install the package.
Now install the desired package using the commandinstall.packages('package_name'). R will automatically download the package and all its dependencies from CRAN and install each one. Your terminal will show the build progress and eventually show whether the package was installed successfully or not. -
Step 4: Troubleshooting. (if needed)
If Step 3 ended with an error, you need to investigate why the build failed. Most common reason for build failure is not loading the necessary modules.
Loading Libraries¶
Once you have packages installed you can load them with the library() function as shown below:
The package is now installed and loaded and ready to be used in R.
Example: Installing dplyr¶
The following demonstrates installing the dplyr package assuming the above-mentioned custom ~/.Rprofile is in place (note its effect in the "Installing package into" information message):
For more information about installing R packages: