To start, go to this link and scroll down to “Download Data”. From there, Sort by Site to download the “BART” dataset for years 2013-2023. In this compressed folder, you should see a list of six folders organized by year in the file name. Store that for now somewhere on your desktop.
From there, follow the instructions from last week’s lectures on barracudar functions to create a new project, storing the dataset in a “Original Data” folder.
#First Create a directory path.
directory_path <- "/Users/danielpr23/Desktop/Computational Biology/DanielPenadosBio6100/NEON_count-landbird"#Change it to the working directory or were all the folder are.
#Create a character vector, with the name of the files. This vector will be used, as the search template.
folders <- list.dirs(directory_path, full.names = FALSE, recursive = FALSE)
print(folders)
## [1] "NEON.D01.BART.DP1.10003.001.2015-06.basic.20240127T000425Z.RELEASE-2024"
## [2] "NEON.D01.BART.DP1.10003.001.2016-06.basic.20240127T000425Z.RELEASE-2024"
## [3] "NEON.D01.BART.DP1.10003.001.2017-06.basic.20240127T000425Z.RELEASE-2024"
## [4] "NEON.D01.BART.DP1.10003.001.2018-06.basic.20240127T000425Z.RELEASE-2024"
## [5] "NEON.D01.BART.DP1.10003.001.2019-06.basic.20240127T000425Z.RELEASE-2024"
## [6] "NEON.D01.BART.DP1.10003.001.2020-06.basic.20240127T000425Z.RELEASE-2024"
## [7] "NEON.D01.BART.DP1.10003.001.2020-07.basic.20240127T000425Z.RELEASE-2024"
## [8] "NEON.D01.BART.DP1.10003.001.2021-06.basic.20240127T000425Z.RELEASE-2024"
## [9] "NEON.D01.BART.DP1.10003.001.2022-06.basic.20240127T000425Z.RELEASE-2024"
# Becouse not all the file are name the same, use regular espression to find them
file_pattern <- "NEON.D01.BART.DP1.10003.001.brd_countdata..*\\.csv" # Adjust the pattern as needed
#I tried to use barracuda funcion add_folder() but I could not make it to work, I found this code online on how to use an if statement, basically does the same. If the folder new_folder does not exist, it creates it. New folder could also be a character vector and create multiple folders.
destination_folder <- "new_folder"#Choose the name of the folder. The code should be clean, and work either name you choose.
if (!file.exists(destination_folder)) {
dir.create(destination_folder)
}
# Now lets loop in folders searching for the file_patten
for (i in folders) {
matching_files <- list.files(path = i, pattern = file_pattern, full.names = TRUE)
if (length(matching_files) > 0) { #Just an If statement to print a wanning message if the code does not find the file.
for (j in matching_files) { #It needs to be a double loop, now that it founds the file, paste it in destination_folder in this case, new_folder.
file.copy(j, file.path(destination_folder, basename(j)))
}
} else {
cat("No files matching", file_pattern, "found in", folder, "\n") #Warning message.
}
}
#If it works, files should be storaged in a new folder in your working directory.
# Set the WD werre the file are.
directory <- "/Users/danielpr23/Desktop/Computational Biology/DanielPenadosBio6100/NEON_count-landbird/new_folder"
# Create a list with the files in the working Directory
files <- list.files(directory, pattern = "NEON.D01.BART.DP1.10003.001.brd_countdata..*\\.csv$", full.names = TRUE)
# begin the loop. Looking to i in files.
for (i in files) {
# I was not able to extract just the year from the file name, this was the closet i got. Deleting all not numeric values
year <- str_extract(basename(i), "\\d{4}-\\d{2}")
data <- read.csv(i)
# First in loop, eliminate row with NA values in the columns scientificName
data <- data[complete.cases(data$scientificName), ]
# Print results.
cat("File:", basename(i), "\n")
cat("Date:", year, "\n")
cat("Abundance:", nrow(data), "\n") #Abundance is just the number of rows
cat("Richness:", length(unique(data$scientificName)), "\n") #Richness is the number or unique values in the column scientificName
cat("\n")
}
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2015-06.basic.20231226T232626Z.csv
## Date: 2015-06
## Abundance: 454
## Richness: 40
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2016-06.basic.20231227T013428Z.csv
## Date: 2016-06
## Abundance: 883
## Richness: 39
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2017-06.basic.20231227T094709Z.csv
## Date: 2017-06
## Abundance: 685
## Richness: 35
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2018-06.basic.20231228T172744Z.csv
## Date: 2018-06
## Abundance: 772
## Richness: 37
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2019-06.basic.20231227T184129Z.csv
## Date: 2019-06
## Abundance: 628
## Richness: 44
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2020-06.basic.20231227T224944Z.csv
## Date: 2020-06
## Abundance: 626
## Richness: 46
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2020-07.basic.20231227T225020Z.csv
## Date: 2020-07
## Abundance: 89
## Richness: 18
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2021-06.basic.20231228T010546Z.csv
## Date: 2021-06
## Abundance: 1015
## Richness: 50
##
## File: NEON.D01.BART.DP1.10003.001.brd_countdata.2022-06.basic.20231229T053256Z.csv
## Date: 2022-06
## Abundance: 699
## Richness: 39
#Create a new df to store the values
results<- data.frame(File = character(),
Date=character(),
Abundance = numeric(),
Richness = numeric())
# Loop through each CSV file
for (i in files) {
data <- read.csv(i)
data <- data[complete.cases(data$scientificName),
]
# This time, stead of just printing the results, we are going to store them in a new vector
abundance <- nrow(data)
richness <- length(unique(data$scientificName))
year <- str_extract(basename(i), "\\d{4}-\\d{2}")
# Use rbind to fill the data frame, always looping in i.
results <- rbind(results, data.frame(File = basename(i),
Date = year,
Abundance = abundance,
Richness = richness))
}
print(results)
## File
## 1 NEON.D01.BART.DP1.10003.001.brd_countdata.2015-06.basic.20231226T232626Z.csv
## 2 NEON.D01.BART.DP1.10003.001.brd_countdata.2016-06.basic.20231227T013428Z.csv
## 3 NEON.D01.BART.DP1.10003.001.brd_countdata.2017-06.basic.20231227T094709Z.csv
## 4 NEON.D01.BART.DP1.10003.001.brd_countdata.2018-06.basic.20231228T172744Z.csv
## 5 NEON.D01.BART.DP1.10003.001.brd_countdata.2019-06.basic.20231227T184129Z.csv
## 6 NEON.D01.BART.DP1.10003.001.brd_countdata.2020-06.basic.20231227T224944Z.csv
## 7 NEON.D01.BART.DP1.10003.001.brd_countdata.2020-07.basic.20231227T225020Z.csv
## 8 NEON.D01.BART.DP1.10003.001.brd_countdata.2021-06.basic.20231228T010546Z.csv
## 9 NEON.D01.BART.DP1.10003.001.brd_countdata.2022-06.basic.20231229T053256Z.csv
## Date Abundance Richness
## 1 2015-06 454 40
## 2 2016-06 883 39
## 3 2017-06 685 35
## 4 2018-06 772 37
## 5 2019-06 628 44
## 6 2020-06 626 46
## 7 2020-07 89 18
## 8 2021-06 1015 50
## 9 2022-06 699 39