Create column labels and spanners via delimited column names
Source:R/tab_create_modify.R
tab_spanner_delim.Rd
The cols_spanner_delim()
function can take specially-crafted column names
and generate one or more spanners (along with relabeling the
column labels). This is done by splitting the column name by a specified
delimiter character (this is the delim
) and placing the fragments from top
to bottom (i.e., higher-level spanners to the column labels). Furthermore,
the neighboring text fragments on different spanner levels will be coalesced
together to put the span back into spanner. For instance, having the three
side-by-side column names rating_1
, rating_2
, and rating_3
will (in the
default case at least) result in a spanner with the label "rating"
above
columns with the labels "1"
, "2"
, and "3"
. There are many options in
cols_spanner_delim()
to slice and dice delimited column names in different
ways:
the delimiter: choose which delimiter to use for the fragmentation of column names into spanners with the
delim
argumentdirection and amount of splitting: we can choose to split n times according to a
limit
argument, and, we get to specify from which side of the column name the splitting should occurreversal of fragments: we can reverse the order the fragments we get from the splitting procedure
column constraints: define which columns in a gt table that should participate in spanner creation using vectors or tidyselect-style expressions
Usage
tab_spanner_delim(
data,
delim,
columns = everything(),
split = c("last", "first"),
limit = NULL,
reverse = FALSE
)
Arguments
- data
The gt table data object
obj:<gt_tbl>
// requiredThis is the gt table object that is commonly created through use of the
gt()
function.- delim
Delimiter for splitting
scalar<character>
// requiredThe delimiter to use to split an input column name. This should be a single character (e.g.,
"_"
,"."
, etc.).- columns
Columns to target
<column-targeting expression>
// default:everything()
The columns to consider for the splitting, relabeling, and spanner setting operations. Can either be a series of column names provided in
c()
, a vector of column indices, or a select helper function. Examples of select helper functions includestarts_with()
,ends_with()
,contains()
,matches()
,one_of()
,num_range()
, andeverything()
. This argument works in tandem with thespanners
argument.- split
Splitting side
singl-kw:[last|first]
// default:"last"
Should the delimiter splitting occur from the
"last"
instance of thedelim
character or from the"first"
? The default here uses the"last"
keyword, and splitting begins at the last instance of the delimiter in the column name. This option only has some consequence when there is alimit
value applied that is lesser than the number of delimiter characters for a given column name (i.e., number of splits is not the maximum possible number).- limit
Limit for splitting
scalar<numeric|integer|character>
// default:NULL
(optional
)An optional limit to place on the splitting procedure. The default
NULL
means that a column name will be split as many times are there are delimiter characters. In other words, the default means there is no limit. If an integer value is given tolimit
then splitting will cease at the iteration given bylimit
. This works in tandem withsplit
since we can adjust the number of splits from either the right side (split = "last"
) or left side (split = "first"
) of the column name.- reverse
Reverse vector of split names
scalar<logical>
// default:FALSE
Should the order of split names be reversed? By default, this is
FALSE
.
Details on column splitting
If we take a hypothetical table that includes the column names
province.NL_ZH.pop
, province.NL_ZH.gdp
, province.NL_NH.pop
, and
province.NL_NH.gdp
, we can see that we have a naming system that has a
well-defined structure. We start with the more general to the left
("province"
) and move to the more specific on the right ("pop"
). If the
columns are in the table in this exact order, then things are in an ideal
state as the eventual spanner labels will form from this neighboring.
When using tab_spanner_delim()
here with delim
set as "."
we get the
following text fragments:
province.NL_ZH.pop
->"province"
,"NL_ZH"
,"pop"
province.NL_ZH.gdp
->"province"
,"NL_ZH"
,"gdp"
province.NL_NH.pop
->"province"
,"NL_NH"
,"pop"
province.NL_NH.gdp
->"province"
,"NL_NH"
,"gdp"
This gives us the following arrangement of column labels and spanner labels:
--------- `"province"` ---------- <- level 2 spanner
---`"NL_ZH"`--- | ---`"NL_NH"`--- <- level 1 spanners
`"pop"`|`"gdp"` | `"pop"`|`"gdp"` <- column labels ---------------------------------
There might be situations where the same delimiter is used throughout but
only the last instance requires a splitting. With a pair of column names like
north_holland_pop
and north_holland_area
you would only want "pop"
and
"area"
to be column labels underneath a single spanner ("north_holland"
).
To achieve this, the split
and limit
arguments are used and the values
for each need to be split = "last"
and limit = 1
. This will give us
the following arrangement:
--`"north_holland"`-- <- level 1 spanner
`"pop"` | `"area"` <- column labels ---------------------
Examples
With a subset of the towny
dataset, we can create a gt table and then
use the tab_spanner_delim()
function to automatically generate column
spanner labels. In this case we have some column names in the form
population_<year>
. The underscore character is the delimiter that separates
a common word "population"
and a year value. In this default way of
splitting, fragments to the right are lowest (really they become new column
labels) and moving left we get spanners. Let's have a look at how
tab_spanner_delim()
handles these column names:
towny_subset_gt <-
towny |>
dplyr::select(name, starts_with("population")) |>
dplyr::filter(grepl("^F", name)) |>
gt() |>
tab_spanner_delim(delim = "_") |>
fmt_integer()
towny_subset_gt
The spanner created through this use of tab_spanner_delim()
is
automatically given an ID value by gt. Because it's hard to know what the
ID value is, we can use tab_info()
to inspect the table's indices and ID
values.
towny_subset_gt |> tab_info()
From this informational table, we see that the ID for the spanner is
"spanner-population_1996"
. Also, the columns are still accessible by the
original column names (tab_spanner_delim()
did change their labels though).
Let's use tab_style()
to add some styles to the towny_subset_gt
table.
towny |>
dplyr::select(name, starts_with("population")) |>
dplyr::filter(grepl("^F", name)) |>
gt() |>
tab_spanner_delim(delim = "_") |>
fmt_integer() |>
tab_style(
style = cell_fill(color = "aquamarine"),
locations = cells_body(columns = population_2021)
) |>
tab_style(
style = cell_text(transform = "capitalize"),
locations = cells_column_spanners(spanners = "spanner-population_1996")
)
We can plan ahead a bit and refashion the column names with dplyr before
introducing the table to gt()
and tab_spanner_delim()
. Here the column
labels have underscore delimiters where splitting is not wanted (so a period
or space character is used instead). The usage of tab_spanner_delim()
gives
two levels of spanners. We can further touch up the labels after that with
cols_label_with()
and text_transform()
.
towny |>
dplyr::arrange(desc(population_2021)) |>
dplyr::slice_head(n = 5) |>
dplyr::select(name, ends_with("pct")) |>
dplyr::rename_with(
.fn = function(x) {
x |>
gsub("(.*?)_(\\d{4})", "\\1.\\2", x = _) |>
gsub("pop_change", "Population Change", x = _)
}
) |>
gt(rowname_col = "name") |>
tab_spanner_delim(delim = "_") |>
fmt_number(decimals = 1, scale_by = 100) |>
cols_label_with(
fn = function(x) gsub("pct", "%", x)
) |>
text_transform(
fn = function(x) gsub("\\.", " - ", x),
locations = cells_column_spanners()
) |>
tab_style(
style = cell_text(align = "center"),
locations = cells_column_labels()
) |>
tab_style(
style = "padding-right: 36px;",
locations = cells_body()
)
With a summarized, filtered, and pivoted version of the pizzaplace
dataset, we can create another gt table and then use the
tab_spanner_delim()
function with the same delimiter/separator that was
used in the tidyr pivot_wider()
call. We can also process the generated
column labels with cols_label_with()
.
pizzaplace |>
dplyr::select(name, date, type, price) |>
dplyr::group_by(name, date, type) |>
dplyr::summarize(
revenue = sum(price),
sold = dplyr::n(),
.groups = "drop"
) |>
dplyr::filter(date %in% c("2015-01-01", "2015-01-02", "2015-01-03")) |>
dplyr::filter(type %in% c("classic", "veggie")) |>
tidyr::pivot_wider(
names_from = date,
names_sep = ".",
values_from = c(revenue, sold),
values_fn = sum,
names_sort = TRUE
) |>
gt(rowname_col = "name", groupname_col = "type") |>
tab_spanner_delim(delim = ".") |>
sub_missing(missing_text = "") |>
fmt_currency(columns = starts_with("revenue")) |>
data_color(
columns = starts_with("revenue"),
method = "numeric",
palette = c("white", "lightgreen")
) |>
cols_label_with(
fn = function(x) {
paste0(x, " (", vec_fmt_datetime(x, format = "E"), ")")
}
)
See also
tab_spanner()
to manually create spanners with more control over
spanner labels.
Other part creation/modification functions:
tab_caption()
,
tab_footnote()
,
tab_header()
,
tab_info()
,
tab_options()
,
tab_row_group()
,
tab_source_note()
,
tab_spanner()
,
tab_stub_indent()
,
tab_stubhead()
,
tab_style_body()
,
tab_style()