[0:10]Hi guys, welcome back to my channel. I'm so glad to see you all here. So today I will be talking about a very basic concept and also will be demonstrating how to install packages in R. Uh, this video is going to be beneficial for anyone who is starting out in bioinformatics and is trying to perform analysis in R by themselves. If you have been following with me through the videos in on my channel, you know that we use a lot of R packages, and if you are trying to replicate some of my analysis by yourself, you at some point will have tried to install these packages yourself. Uh, before this, I have never spoken about what is CRAN, what is Bioconductor? So today, I thought I'd maybe talk a little bit more about what these different terms stand for, what are these things and how to install packages in R, what are the various ways to do it? And hopefully, that will also help you troubleshoot any errors if you see them while you're trying to install these packages yourself. So, R packages are nothing but these are collection of codes and functions that are developed by the community, and R has an extensive support for bioinformatics in terms of R has an active community that constantly develops codes and these packages for bioinformatics analysis. And that is also one of the reason why I prefer coding in R for bioinformatics analysis. So, these codes that are already written by the community can be leveraged by um installing them into our system and then then can be used to perform analysis on your data sets. So, these R packages can be deposited at various locations. So one of the commonly used location to store these R packages is called CRAN. CRAN is official the official repository to store R packages, and CRAN can be looked at as like an App Store. So, just like we have an App Store in our phone, which holds various apps and you can install individual apps based on your requirement to perform specific functions. Similarly, CRAN holds all the packages or most packages in R, which can be used to where you can go and look up the packages that you need. You can install these packages into your system to perform specific analysis. The other location where the code can be hosted is GitHub, and although GitHub is not very R-specific, but it's probably the most popular repository for open-source projects, and a lot of code for bioinformatics analysis is being hosted on GitHub. So, there there might have been packages in the past that we might have downloaded from GitHub, and if not, then there will be packages in the coming future where we where we will be downloading the packages from GitHub. And the third location is Bioconductor. So Bioconductor repository is very bioinformatics centric. It is meant to um hold packages um in order to allow for open-source software to be accessible for to all bioinformaticians or to anyone who's trying to perform bioinformatics analysis. Um, like CRAN, it has its own submission and review process, and packages in Bioconductor are well curated as well as its community is very active. So today, I intend to demonstrate how to install packages from each of these sources in R. So there are various ways to install packages in R. You can use the function install.packages(), um you can also use graphical interface of user interface of RStudio. You can also download source file for the packages and install packages from the source files, or you can use other packages like devtools, remotes or BiocManager. So today, I will be demonstrating each of these methods in further detail in a bit in our Studio. So here is just a little diagrammatic representation of how to install packages in R from these sources, as this can be a little confusing for anyone who is new or is starting out in uh by installing packages in R. So, these are the three locations from where bioinformaticians primarily use code from. So, we previously discussed um these three sources in a little detail.
[4:20]So let us first look at how to install packages in CRAN. So, to install packages from CRAN, we can use a function, an R function called install.packages(). But if we want to install packages from, let's say, GitHub or from Bioconductor, so let us first take the example of GitHub. In that case, we might need an additional package, which can help us to install um date uh code or packages from GitHub. So in this case, we will need a a package called devtools or remotes that can help us install code or packages from GitHub. So, before we directly jump on to downloading code or installing code from or packages from GitHub, we will need a package called devtools, which we we will download from CRAN. And once we download devtools, we can use that package to download other packages from GitHub. Similarly, for Bioconductor, we will need a package called BiocManager, and this BiocManager is present in CRAN repository. So we will first need to download BiocManager and then using the BiocManager, we can download packages from Bioconductor. If this is a little confusing still, I will try to demonstrate this, and hopefully, that will clear out any confusion when I demonstrate how to install packages from these sources. So, here is my RStudio, and for anyone who is new to R or is trying to learn R, basically, RStudio is a software where uh it allows you to use the R language so that you can write your scripts in R and you can also see the output of the scripts here. So, uh basically it shows like four windows, the order of the windows can be different. So one window is where you can write your code. The the window for me at the bottom is the console where you see the output of the code. The the top right window is the environment, so it basically shows you any variables or data frames or tables that you have loaded or you have read your data in, you'll be able to see all of those data structures here. And here is the fourth pane, which is, which shows a lot of information like file structure, like what file or what folder you are in. The plot if you have generated any plot, uh the list of packages that are installed in your system, and help, viewer, and so on and so forth. So, let us start by loading a library first. So I want to load a library, so the packages are called libraries uh when you're trying to load them in R. So I want to load a library called readxl, which allows me to read uh Excel files in R. So when I try to load the library, I get an error that says there is no package called readxl. Now, this is a classic error in R, which tells you that it is not able to find the package called readxl, so you have to make sure whether you have this package installed in your system before you want to use it. So let us go to the packages pane here and search for readxl. And as you can see that when I search for readxl, I do not get any packages here, which means that readxl is not installed in my system. So, readxl is a CRAN package, which I know is deposited in CRAN. So I can install it by writing install.packages and in the quotes, it can be a single quote or a double quote, and I can provide the name of the packages called readxl. And now I run this line. It seems that it has installed the packages. So now let us try to load the library again and we should not be seeing the error. And when we load the library, we just get a warning that says that the package was built under R version 4.1.2, but other than that, we do not get any error messages, which means that the packages has been successfully installed, and now I can use the functions that are a part of this package to perform any other operations that I want to. So, if you notice, um, the readxl is now being populated in the packages pane. So now you can see that this is available in our system, and hence, we can see that the name of readxl being popping up, uh, when we try to search for readxl in the packages tab. Another way to install CRAN packages is by using graphical user interface of RStudio. So if you look at the menu bar, you will be able to find tools. So if you click on tools, the first option would be install packages. So when you click for install packages, here is a small dialogue box that will open, and for the first option, the drop-down will prove uh is uh where to install packages from. So it can be CRAN or it can be any other source files that you have downloaded. So you can choose CRAN here. And in the packages uh text bar, you can write the name of the package, and you can find readxl, it's a part of CRAN, so it definitely pops up. And when uh you write the name and you select the package that you want to install, you can also uh choose the location of the library. So the default library for me is this path. So all my packages will be stored at this path from CRAN, and then I can click on install. I'm not going to install it again because I just installed readxl, but this is also another way how you can download and install packages in R. Next, you can also um choose to download it from a local file. Let's say you have installed a a source file onto your system, and you want to install a package from your source file. So in that case, you will we'll have to first install the source file. So let us go to um CRAN again. So this is essentially the CRAN page. This is how the CRAN page looks like, and from here, you can download R as well. So, let us go to the packages page. And when I click on the table of available packages, these are all the available packages along with the date. There are a lot of packages. As you can see, how small is the scroll bar here. So I'm just going to search for the package that I'm interested in, that is stringr. So I want to install this package, but I want to install it from the source file. So this is essentially the package source file.tar.gz. So I'm going to click here, and when I click here, it'll allow me to download this file onto my system. I'm not going to do that because I've already downloaded uh the file, but you can download the file by clicking here and saving it in your system. Once you have downloaded the file, you can install it from by giving the path to the source file, the local path to the source file. So the way you do it is you again use the function install packages, and you provide the first parameter you provide is the path to the source file and the name of the source file. So my source file is saved here, so I'm going to copy the path to the source file, as well as the name of the source file. So this is the source file that we downloaded. So I'm going to provide the complete path and the name of the source file. And I'm going to set the repos as NULL. Now we're going to download install this. So now you can see that it has, I do not see any errors here, I just see the logs of it being installed. So when I try to load the library, the library should load without any error. And as you can see, we loaded the library stringr, and now we do not see any errors, which means that it has been successfully been able to install the stringr package from the local file. Just like CRAN, uh local packages can also be installed into your system using graphical user interface. So again, you can go to the menu bar of RStudio and click on tools and click on install packages. From the install from drop-down, you can choose package archive file, and when you click on browse, you can browse to the location where you have stored the source file. So basically, it will populate the path and the name of the source file, and when you click on install, it will install the package from the location into your system.
[13:03]Now let us try to install uh packages from GitHub. So today, we are going to try to install a package called shiny. So when I try to load the library, it gives me an error that says there is no package called shiny. This means that shiny is not installed in our system. So in order to install shiny um into our system, we will need an additional package to download packages from GitHub. So I will need a package called remotes, which is like devtools, which can help us to install packages from GitHub. So since remotes is a CRAN package, we can use install.packages function to install remotes package, just like it is installing readr or stringr library. We are installing remotes first, and once remotes is downloaded from CRAN, we can use remotes to install shiny from GitHub. So let us install remotes. Now that remotes has been installed, let us load the remotes library first. And now let us remove let us use the remotes functions to install. So when I type install, you can see there are a lot of prompts here, and you can see you can install packages from various sources. So I'm going to choose install GitHub because that's where I want to install my package from. And let us now look at the path to um the repository or the package on GitHub. So here is the GitHub link, and I've already opened this in Chrome. So this is basically the shiny um package that I want to install in RStudio. So basically, this is essentially the path that I will be providing to R. So in the function, I will type RStudio/shiny. So this is the place from GitHub where I want to install the shiny package from. And now I will run this. You can see that it prompts downloading GitHub repo RStudio/shiny at head, and it's getting the code and installing the package from there. Now that we can see that shiny has finished installing, and we essentially do not see any errors here, we should be able to load the library shiny without any errors here. And as you can see, we loaded the library shiny, and now we do not see any errors, which means that it has successfully been installed from GitHub repository. Now finally, let us try to install a package from Bioconductor, and I want to install a package called Biostrings. So Biostrings is basically a Bioconductor package that allows for testing string manipulations of uh large biological sequences. So, let us try to load Biostrings first to make sure that it is not present in our system. And when I try to load the library, it tells me that there is no package called Biostrings, which indicates that I need to install it into my system. So in order to get Biostrings, I will need a package called BiocManager. So BiocManager is a CRAN package, which will help us to install packages from Bioconductor. So since BiocManager is a CRAN package, we install CRAN packages by using install.packages function. And then we uh mention the name of the package in a code, and then we run this. It seems that BiocManager has been installed, so let us first load the BiocManager library.
[16:52]And now that we have the library installed successfully, we can use it to install Biostrings from Bioconductor. So we write BiocManager and use the function install, and within install, we specify the name of the library, that is Biostrings, and we run this. You can check the installation of Biostrings by loading the library once it finishes running, and here since the my library loaded without any errors, it seems that the library has successfully been installed from Bioconductor. You can also test out a few functions to make sure that the package has been installed correctly. So there is a function called DNAString from Biostrings, which allows you to define a DNA string. And using this DNAString, we can perform manipulations on this DNA string.
[18:02]So let us say you want to have and create a reverse complement of this DNA string. Bioconductor makes it really easy for us to uh create the reverse complement of the string. So this was the DNA string, and this is the reverse complement of it. So this is the advantage of um using the R packages, especially packages from Bioconductor, because they have been specifically developed for bioinformatics uh analysis and manipulations, and there is code already been written by the community, which can be used by many of us without having to rewrite the code again, to perform some uh similar or some simple task like creating a reverse complement. So now that we have looked at various ways to install packages, it is important for us to know where our packages are saved. So R saves uh packages its certain parts, and these parts are can be located by typing libPaths. And when you run this function, you will be able to see all the paths where your um where your packages have been stored. So in case if you want to go back and check on the code that is present in these packages, you can go to these locations and can go to the repository of these packages and can take a look of a look at these code as well. Uh and lastly, um in case if you have to remove any packages, you can use function called remove.packages, and within single quotes or double quotes, you can provide the name of the package. That you want to remove. So I want to remove the readxl package, the first package that we installed. So I'm just going to uh run this. And now you will be able to see that when you try to look up for readxl uh in your packages tab, you won't be able to see readxl um being populated here. So it means that it has been removed from your system, and now if you want to use it, you might have to install readxl again and use the functions from readxl. So that's all I had for today's video. Uh I hope you found this basic and introductory um video helpful and informative. I hope this is helpful for anyone who is looking to start out in bioinformatics or has been trying to find their and get their way around in R. So if you found this video informative and helpful, please make sure you hit the subscribe button, like the video, share it, and leave your comments under the comment section. Until next time, see you.



