The goal of tidysummary is to streamlines the analysis of clinical data by automatically selecting appropriate statistical descriptions and inference methods based on variable types.

Prepare your data

A data frame containing the variables to analyze, with variables at columns and observations at rows:

Continuous variables: Numeric.
Categorical variables: Factor (Ordinal Categorical variables: ordered Factor).

add_var()

The add_var() function prepares your dataset for downstream analysis by classifying variables into:

Continuous variables: Further subdivided by normality and equal variance assumptions.
Categorical variables: Further subdivided by ordered status and expected frequency.

Usage

Specify the variables to summarize in var and the grouping variable in group.

data <- iris %>%
  add_var(var = c("Sepal.Length", "Sepal.Width"), group = "Species")

The function can automatically checks normality using statistical tests. You can choose:

norm

'auto': By default, automatically checks normality, but the same as ask when n > 1000.
'ask': Displays automatic result, QQ plots and prompts for manual confirmation.
true: Treats all variables as normal.
false: Treats all variables as non-normal.

data <- iris %>%
  add_var(var = c("Sepal.Length", "Sepal.Width"), group = "Species", norm = "ask")

add_summary()

The add_summary() function summarize your dataset from add_var() result with:

A summary dataframe with rows as the variables and columns as the group.

Usage

Just input the result from add_var()

summary <- data %>%
  add_summary()

If you want to custom the summary style, You can choose:

add_overall

TRUE: By default, include an “Overall” summary column.
FALSE: Show only groups summary column.

continuous_format

Format string to override both norm_continuous_format, and unnorm_continuous_format.

Accepted placeholders are '{mean}', '{SD}', '{median}', '{Q1}', '{Q3}'.

norm_continuous_format

Default is '{mean} ± {SD}'. Accepted placeholders same as continuous_format.

unnorm_continuous_format

Default is '{median} ({Q1}, {Q3})'. Accepted placeholders same as continuous_format.

categorical_format

Format string for categorical variables. Default is '{n} ({pct})'. Accepted placeholders are '{n}' and '{pct}'.

binary_show

'last': By default, show only last level.
'first': Show only first level.
'all': show all levels.

summary <- data %>%
  add_summary(add_overall = T,
              continuous_format = "{mean} ± {SD}",
              categorical_format = "{n} ({pct})",
              binary_show = "last")

add_p()

The add_summary() function summarize your dataset from add_summary() result with:

A summary_with_p dataframe with rows as the variables and columns as the group.

Usage

Just input the result from add_summary()

summary_with_p <- summary %>%
  add_p()

If you want to custom the summary_with_p column, You can choose:

asterisk

TRUE: Show asterisk significance markers.
FALSE: By default, show p-values.

add_method

TRUE: Show method text.
FALSE: By default, not show method text.
'code': Show method as codes according to order of appearance.

add_statistic_name

TRUE: Show statistic name.
FALSE: By default, not show statistic name.

add_statistic_value

TRUE: Show statistic value.
FALSE: By default, not show statistic value.

summary_with_p <- summary %>%
  add_p(asterisk = T, add_method = "code")