Split up a data frame based on variable expressions or on conditions to receive multiple smaller data frames. Both possibilities can be used at the same time.
Usage
split_by(
data_frame,
...,
formats = list(),
inverse = FALSE,
monitor = .qol_options[["monitor"]]
)Arguments
- data_frame
A data frame which should be split up into multiple data frames.
- ...
Pass in one or multiple variables and/or conditions on which the provided data frame should be splitted.
- formats
A list in which is specified which formats should be applied to which variables.
- inverse
Uses the inverse conditions to split up the data frame.
- monitor
FALSE by default. If TRUE, outputs two charts to visualize the functions time consumption.
Value
Returns a list of data frames split by variable expressions and/or conditions. The lists names are the variable expressions or conditions.
Details
split_by() is based on the explicit Output from 'SAS'. With the Output function
one can - among other things - explicitly tell 'SAS' which observation to output into
which data set. Which enables the user to output one observation into one or multiple
data sets.
Instead of subsetting the same data frame multiple times manually, you can subset it multiple times at once with this function.
Examples
# Example data frame
my_data <- dummy_data(1000)
# Split by variable expressions
split_var_df <- my_data |> split_by(sex)
# Split by conditions
split_cond_df <- my_data |> split_by(sex == 1 & age < 18,
sex == 2 & age >= 18)
# Split by condition with inverse group
split_inv_df <- my_data |> split_by(sex == 1, inverse = TRUE)
# Split by variables and conditions
split_combi_df <- my_data |> split_by(state, education,
sex == 1, age < 18)
# Split by variable expressions using formats
state. <- discrete_format(
"Germany" = 1:16,
"Schleswig-Holstein" = 1,
"Hamburg" = 2,
"Lower Saxony" = 3,
"Bremen" = 4,
"North Rhine-Westphalia" = 5,
"Hesse" = 6,
"Rhineland-Palatinate" = 7,
"Baden-Württemberg" = 8,
"Bavaria" = 9,
"Saarland" = 10,
"West" = 1:10,
"Berlin" = 11,
"Brandenburg" = 12,
"Mecklenburg-Western Pomerania" = 13,
"Saxony" = 14,
"Saxony-Anhalt" = 15,
"Thuringia" = 16,
"East" = 11:16)
split_format_df <- my_data |> split_by(state,
formats = list(state = state.))
