example_extracting_chunks_BirdNET.Rmd

---
title: "Demo extracting clips based on BirdNET output"
output:
  rmarkdown::github_document
---

Example code for reading BirdNET output and exporting detections in species subfolders

```{r}
#devtools::install_github('BritishTrustForOrnithology/AcousticTools')
library(AcousticTools)

#read a BirdNET output
df <- AcousticTools::read_birdnet_results(folder = "H:/Leiothrix Recordings/ST4037")
head(df)
```

Currently we have all detections. We might want to do some filtering or sampling. For this simple example I'm just going to limit to high scoring detections. More complex approaches, like site- or species-based stratified sampling could be adopted.
```{r}

#keep all high scoring detections 
df <- subset(df, score >=0.9)
```


Assuming files are in recognised YYYYMMDD-HHMMSS format I recommend extracting the date and time as a datetime variable. Then the detection offset (start) can be added to get the start datetime of the detection. First split the filename using stringr::str_split_fixed.
```{r}
library(stringr)

#split the original filename parts
bits <- setNames(as.data.frame(stringr::str_split_fixed(string = basename(df$original_wav), pattern = "-|\\.", n = Inf)), 
                 c('date_str', 'time_str', 'loc', 'rec', 'mic','ext'))
head(bits)
#join to the main df
df <- cbind(df, bits)
```

Now convert this to a date using lubridate and add the start time offset for the detection.
```{r}
library(lubridate)
#extract the datetime of the original recording
df$recording_start_dt <- as.POSIXct(paste(df$date_str, df$time_str), format = "%Y%m%d %H%M%S")

#add seconds offset to get detecion datetime
df$detection_dt <- df$recording_start_dt + df$start

head(df[,c("original_wav", "start", "recording_start_dt", "detection_dt")])
```

Now we can use the detection datetime to make a unique filename. In this example we will put chunks in species folders.

```{r}

#make the new filename. Use __ to infer unknown species at this point
df$newfilename <- paste0(format(df$detection_dt, "%Y%m%d-%H%M%S"),
                         '-',
                         df$loc,
                         '-',
                         df$rec,
                         '-',
                         df$mic,
                         '-__.',
                         df$ext)
                         

#where will chunks be saved?
path_export <- 'C:/exports'

#make the full path and filename of each clip to be exported
df$chunk_fullname <- file.path(path_export, df$birdnet_english_name, df$newfilename)
head(df$chunk_fullname)
```

Now we can do the exporting using AcousticTools::extract_chunk. In this example I will export 5 second chunks centred on the 3 second BirdNet detections.
```{r}
#iterate over chunks - just do first 10 for demo. 
n <- nrow(df)
n <- 10
for(i in 1:n) {
  AcousticTools::extract_chunk(file_wav = df$original_wav[i],
                               file_chunk = df$chunk_fullname[i],
                               start = df$start[i], 
                               end = df$end[i], 
                               chunk_duration = 5,
                               verbose = TRUE)
  
}
```
## Fully anonymised chunk names

If you want to export fully anonymised files this can be done as follows. This date time steps can be ignored in this case. **Remember to export the lookup so you know what L4G4Y06Y3YF2TFF.wav actually was!**
```{r}
library(stringi)
#how many files need anonymised names
num_files <- nrow(df)
#make random strings of 15 characters and/or numbers
df$randstrings <- stringi::stri_rand_strings(n = num_files, length = 15, pattern = "[A-Z0-9]")
#make into filenames
df$chunk_fullname_anonymised <- file.path(path_export, paste0(df$randstrings, ".wav"))
head(df$chunk_fullname_anonymised)

#remember to save the lookup!
save(df, file = file.path(path_export, "detection_to_anonymised_names_lookup.Rdata"))
```