Easy error handling in R with purrr’s possibly
It’s discouraging to see your code choke section of the way by whilst attempting to utilize a operate in R. You may well know that one thing in 1 of those objects triggered a difficulty, but how do you monitor down the offender?
The purrr package’s perhaps()
operate is 1 quick way.
In this case in point, I’ll demo code that imports multiple CSV data files. Most files’ benefit columns import as people, but 1 of these will come in as numbers. Working a operate that expects people as enter will result in an error.
For set up, the code under masses several libraries I have to have and then uses foundation R’s list.data files()
operate to return a sorted vector with names of all the data files in my data listing.
library(purrr)
library(readr)
library(rio)
library(dplyr)
my_data_data files <- list.files("data_files", full.names = TRUE) {d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}>{d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}
kind()
I can then import the initial file and seem at its construction.
x <- rio::import("data_files/file1.csv") str(x) 'data.frame': 3 obs. of 3 variables: $ Category : chr "A" "B" "C" $ Value : chr "$4,256.48 " "$438.22" "$945.12" $ MonthStarting: chr "12/1/20" "12/1/20" "12/1/20"
Both the Value and Month columns are importing as character strings. What I in the end want is Value as numbers and MonthStarting as dates.
I often deal with concerns like this by crafting a modest operate, these as the 1 under, to make changes in a file immediately after import. It uses dplyr’s transmute()
to develop a new Month column from MonthStarting as Date objects, and a new Whole column from Value as numbers. I also make certain to continue to keep the Classification column (transmute()
drops all columns not explicity outlined).
library(dplyr)
library(lubridate)
course of action_file <- function(myfile)
rio::import(myfile) {d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}>{d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}
dplyr::transmute(
Classification = as.character(Classification),
Month = lubridate::mdy(MonthStarting),
Whole = readr::parse_variety(Value)
)
I like to use readr’s parse_variety()
operate for converting values that come in as character strings because it promotions with commas, dollar indicators, or p.c indicators in numbers. Even so, parse_variety()
demands character strings as enter. If a benefit is now a variety, parse_variety()
will throw an error.
My new operate is effective great when I check it on the initial two data files in my data listing utilizing purrr’s map_df()
operate.
my_benefits <- map_df(my_data_files[1:2], process_file)
But if I try out managing my operate on all the data files, which includes the 1 where by Value imports as numbers, it will choke.
all_benefits <- map_df(my_data_files, process_file) Error: Problem with `mutate()` input `Total`. x is.character(x) is not TRUE ℹ Input `Total` is `readr::parse_number(Value)`. Run `rlang::last_error()` to see where the error occurred.
That error tells me Whole is not a character column in 1 of the data files, but I’m not certain which 1. Preferably, I’d like to run by all the data files, marking the 1(s) with problems as problems but nevertheless processing all of them as an alternative of stopping at the error.
perhaps()
lets me do this by developing a brand new operate from my primary operate:
safer_course of action_file <- possibly(process_file, otherwise = "Error in file")
The initial argument for perhaps()
is my primary operate, course of action_file
. The 2nd argument, normally
, tells perhaps()
what to return if there is an error.
To utilize my new safer_course of action_file()
operate to all my data files, I’ll use the map()
operate and not purrr’s map_df()
operate. That’s because safer_course of action_file()
requirements to return a list, not a data frame. And that is because if there is an error, those error benefits won’t be a data frame they’ll be the character string that I explained to normally
to make.
all_benefits <- map(my_data_files, safer_process_file)
str(all_benefits, max.degree = 1) Record of five $ :'data.frame':3 obs. of 3 variables: $ :'data.frame':3 obs. of 3 variables: $ :'data.frame':3 obs. of 3 variables: $ : chr "Mistake in file" $ :'data.frame':3 obs. of 3 variables:
You can see in this article that the fourth product, from my fourth file, is the 1 with the error. That’s quick to see with only five products, but would not be pretty so quick if I experienced a thousand data files to import and three experienced problems.
If I identify the list with my primary file names, it is simpler to determine the difficulty file:
names(all_benefits) <- my_data_files str(all_results, max.level = 1) List of 5 $ data_files/file1.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file2.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file3.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file4.csv: chr "Error in file" $ data_files/file5.csv:'data.frame': 3 obs. of 3 variables:
I can even preserve the benefits of str()
to a textual content file for further more assessment.
str(all_benefits, max.degree = 1) {d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}>{d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}
capture.output(file = "benefits.txt")
Now that I know file4.csv is the difficulty, I can import just that 1 and validate what the problem is.
x4 <- rio::import(my_data_files[4]) str(x4) 'data.frame': 3 obs. of 3 variables: $ Category : chr "A" "B" "C" $ Value : num 3738 723 5494 $ MonthStarting: chr "9/1/20" "9/1/20" "9/1/20"
Ah, Value is indeed coming in as numeric. I’ll revise my course of action_file()
operate to account for the chance that Value isn’t a character string with an ifelse()
check:
course of action_file2 <- function(myfile)
rio::import(myfile) {d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}>{d11068cee6a5c14bc1230e191cd2ec553067ecb641ed9b4e647acef6cc316fdd}
dplyr::transmute(
Classification = as.character(Classification),
Month = lubridate::mdy(MonthStarting),
Whole = ifelse(is.character(Value), readr::parse_variety(Value), Value)
)
Now if I use purrr’s map_df()
with my new course of action_file2()
operate, it really should work and give me a solitary data frame.
all_results2 <- map_df(my_data_files, process_file2) str(all_results2) 'data.frame': 15 obs. of 3 variables: $ Category: chr "A" "B" "C" "A" ... $ Month : Date, format: "2020-12-01" "2020-12-01" "2020-12-01" ... $ Total : num 4256 4256 4256 3156 3156 ...
That’s just the data and format I desired, thanks to wrapping my primary operate in perhaps()
to develop a new, error-handling operate.
For extra R strategies, head to the “Do Far more With R” website page on InfoWorld or check out the “Do Far more With R” YouTube playlist.
Copyright © 2020 IDG Communications, Inc.