Skip to contents

These retain functions all have one thing in common: transferring a value from one case to the next. What they make out of this functionality can be quiet different. Therefor there is a function for each different use case.

running_number() computes running numbers in a data frame. Without specifying a by variable results in the row number. With by variable computes the running number within each group of expressions.

mark_case() sets a flag for the first or last case within the provided by group.

retain_value() retains the first value for all cases of the same group and saves it into a new variable.

retain_sum() retains the summarised values for all cases of the same group and saves it into a new variable.

Usage

running_number(data_frame, var_name = "run_nr", by = NULL)

mark_case(data_frame, var_name = "first", by = NULL, first = TRUE)

retain_value(data_frame, var_name = "retain_value", value, by = NULL)

retain_sum(data_frame, var_name = "retain_sum", value, by = NULL)

Arguments

data_frame

The data frame in which to compute retained variables.

var_name

The name of the newly created variable.

by

By group in which to compute the retained variable.

first

mark_case(): If TRUE marks the first case within a group, otherwise the last case.

value

retain_value: One or multiple variables of which a value should be retained.

retain_sum: One or multiple variables of which the sum should be retained.

Value

running_number(): Returns the data frame with a new variable containing a running number.

mark_case(): Returns the data frame with a new variable marking first or last cases.

retain_value(): Return the data frame with a new variable containing a retained value.

retain_sum(): Return the data frame with a new variable containing a retained sum.

Details

The functions listed here are based on the 'SAS' function retain. On a very basic level retain can do two things, depending on the position in the 'SAS' code: It can either sort variables column wise or it can - since it works row wise - remember a value from one row to the next. The functions here concentrate on the second part.

Remembering a value from a previous observation offers multiple use cases. E.g. always adding +1 to the previous case creates a running number. Or if an observation knows the value of the previous one, it can check whether it is of the same value or another, e.g. to mark first or last cases within a group.

In it's simplest form it can remember a value from the first observation and transfer it to all other observations.

All of these functions work on the whole data frame as well as on groups, e.g. to transfer a value from the first person in a household to all other persons of the same household.

Examples

# Example data frame
my_data <- dummy_data(1000)

# Get row numbers
my_data <- my_data |> running_number()
my_data <- my_data |> running_number("row_number")

# Running number per variable expression
my_data <- my_data |> running_number(by = year)

# Mark first and last cases
my_data <- my_data |>
    mark_case(by = household_id) |>
    mark_case(var_name = "last", by = household_id, first = FALSE)

# Retain first value inside a group
my_data <- my_data |>
    retain_value(var_name = c("household_weight", "household_icome"),
                 value    = c(weight, income),
                 by       = c(state, household_id))

# Retain sum inside a group
my_data <- my_data |>
    retain_sum(var_name = c("weight_hh_sum", "icome_hh_sum"),
               value    = c(weight, income),
               by       = c(state, household_id))