Data types, solutions • BasicR

Exercises with vectors

Create the following vectors: 1, 1, 3, 4, 5 and 2, 2, 5, 4, 1
Find the minimum of both vectors
Find the common minimum of the vectors
Summarize the vectors element-wise and all elements. 4 create the element-wise squere root of the element-wise sum of the vectors.
order both vectors in decreasing order
find the elements that are duplicated in the vectors.
find out which element of vector one is in vector 2.
Create one vector with 100 random numbers, between 1 and 100, with the possibility to repeat (hint: sample function)
find out, how many elements are equal to three.
do it again - random numbers
do it with running the set.seed(23)
change all the elements that are equal to three to 4. check your results.
create named vectors of the two first vectors. Order the second one as the first, based on the names (match)
combine the two vectors
Is there any element of vector two that is larger than the respective element of vector 1?
Is there any element of vector two that is larger than the the largest element of vector 1?

#1.
a <- c(1,1,3,4,5)
b <- c(2, 2, 5, 4, 1)
#2.
min(a)

## [1] 1

min(b)

## [1] 1

#3.
min(c(a,b))

## [1] 1

min(a,b)

## [1] 1

#4.
a+b

## [1] 3 3 8 8 6

sum(a+b)

## [1] 28

#5.
sqrt(a+b)

## [1] 1.732051 1.732051 2.828427 2.828427 2.449490

#6. 
sort(a, decreasing = T)

## [1] 5 4 3 1 1

sort(b, decreasing = T)

## [1] 5 4 2 2 1

#7
 which(duplicated(a))

## [1] 2

a[a==a[duplicated(a)]]

## [1] 1 1

#8
a <- sample(1:100, 100, replace = T)
sum(a==3)

## [1] 0

Exercises with data frames

load the iris dataset (data)
exctract the Petal.Lenght column as a vector. Do it by column name and column index as well.
create a data frame with the columns Sepal.Width, Sepal.Length and Species colums.
Get the maximum Petal.With for the Species setosa.
Get the second element of the Sepal.Width column
How many setosa are there with the Petal.Width of 0.2

data("iris")
plength <- iris[,"Petal.Length"]
plength <- iris[,3]
iris$Petal.Length

##   [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
##  [19] 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2
##  [37] 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7 4.5 4.9 4.0
##  [55] 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.0
##  [73] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0
##  [91] 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3
## [109] 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0
## [127] 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9
## [145] 5.7 5.2 5.0 5.2 5.4 5.1

new_df <- iris[,c("Sepal.Width", "Sepal.Length", "Species")]
new_df <- iris[,c(2,1,5)]
max(iris[iris$Species=="setosa", "Petal.Width"])

## [1] 0.6

iris[2, "Sepal.Width"]

## [1] 3

sum(iris[iris$Species=="setosa", "Petal.Width"]==0.2)

## [1] 29

Exercises with regular expressions

dataset <- data.frame(Patient.ID=c("normal_01", "normal_02", "normal_03", "tumor_01", "tumor_02", "tumor_02"), 
                      Sentrix.position=c("A01B01", "A01B02", "A016A01", "B02A02", "C01D02", "C02C01"), Treatment=c("Treated", "Treated", "Not treated", "Treated", "Treated", "Not treated"), value=c(3.25, 3.67, 4.26, 6.24, 5.78, 7.32), row.names = c("Sample1", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6"))

Create a column with sample type (tumor or normal)
table treatment versus sample type
add an "_" to the sample names: sample_3
summarize all values that are coming from normal samples
change all “A”s in the Sentrix.position column to “E”s.
change all “E”s back to “A”s, if they appear second. Do it as generalized as possible.

#Examples:
grep("normal", dataset$Patient.ID)

## [1] 1 2 3

grep("norm", dataset$Patient.ID)

## [1] 1 2 3

grep("nom", dataset$Patient.ID)

## integer(0)

grepl("normal", dataset$Patient.ID)

## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

grepl("[[:alpha:]]", dataset$Patient.ID)

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

grepl("[[:alpha:]]{5}", dataset$Patient.ID)

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

grepl("[[:alpha:]]{6}", dataset$Patient.ID)

## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

grepl("[[:alpha:]]_[[:digit:]]", dataset$Patient.ID)

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

grepl("[[:alpha:]]{6}_[[:digit:]]{2}", dataset$Patient.ID)

## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

regexec("[[:alpha:]]_[[:digit:]]", dataset$Patient.ID)

## [[1]]
## [1] 6
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[2]]
## [1] 6
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[3]]
## [1] 6
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[4]]
## [1] 5
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[5]]
## [1] 5
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[6]]
## [1] 5
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE

gregexpr("[[:alpha:]]_[[:digit:]]", dataset$Patient.ID)

## [[1]]
## [1] 6
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[2]]
## [1] 6
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[3]]
## [1] 6
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[4]]
## [1] 5
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[5]]
## [1] 5
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[6]]
## [1] 5
## attr(,"match.length")
## [1] 3
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE

gsub("_", ".", dataset$Patient.ID)

## [1] "normal.01" "normal.02" "normal.03" "tumor.01"  "tumor.02"  "tumor.02"

gsub(".", "_", dataset$Patient.ID)

## [1] "_________" "_________" "_________" "________"  "________"  "________"

gsub("\\.", "_", dataset$Patient.ID)

## [1] "normal_01" "normal_02" "normal_03" "tumor_01"  "tumor_02"  "tumor_02"

gsub(".", "_", dataset$Patient.ID, fixed = T)

## [1] "normal_01" "normal_02" "normal_03" "tumor_01"  "tumor_02"  "tumor_02"

gsub("([[:alpha:]]{5,6})_([[:digit:]]{2})", "\\2", dataset$Patient.ID)

## [1] "01" "02" "03" "01" "02" "02"

gsub("([[:alpha:]]{5,6})_([[:digit:]]{2})", "\\1", dataset$Patient.ID)

## [1] "normal" "normal" "normal" "tumor"  "tumor"  "tumor"

gsub("([A-Za-z]{5,6})_([[:digit:]]{2})", "\\1", dataset$Patient.ID)

## [1] "normal" "normal" "normal" "tumor"  "tumor"  "tumor"

dataset$Sample_type <- gsub("([A-Za-z]{5,6})_([[:digit:]]{2})", "\\1", dataset$Patient.ID)

rownames(dataset) <- gsub("Sample", "Sample_", rownames(dataset))

Exercises with factors

check if there are any factors in dataset.
Turn the sample type column into factors.
Add an “unknown” level
Order the levels, so the first is “tumor”.
Order them according the the mean of value in decreasing order.

dataset$Sample_type <- ifelse(grepl("normal", dataset$Patient.ID), "normal", "tumor")
dataset$Sample_type <- factor(dataset$Sample_type, levels = c("normal", "tumor"))
dataset$Sample_type <- as.factor(dataset$Sample_type)
dataset$Sample_type <- factor(dataset$Sample_type, levels=c("normal", "tumor", "unknown"))
dataset$Sample_type <- factor(dataset$Sample_type, levels=c("tumor", "normal", "unknown"))

Lists

Create a list with 5 elements, each different class.
Create a list with one vector, one list, one matrix and one number. Name the list elements. Access the third element by name.
Delete the second element of the above list.
Add a data frame to the end of the list. Access the 3rd row, 2nd column element of that data frame.
Create a list with two elements, where each element has two sublists.

my_list <- list("a", c(1,2), TRUE, factor(c("apple", "oranges")), list(1,2))

my_list <- list(vec=c(1,2), ll=list(1,2,3), mat=matrix(1:9, nrow=3), num=3)
my_list[[2]] <- NULL

my_list[[4]] <- data.frame(one=c(1,2), two=c("one", "two"), stringsAsFactors = F)

my_list[[length(my_list)+1]] <- "empty"