Function textwrap::wrap_algorithms::wrap_optimal_fit
source · pub fn wrap_optimal_fit<'a, 'b, T: Fragment>(
fragments: &'a [T],
line_widths: &'b [f64],
penalties: &'b Penalties,
) -> Result<Vec<&'a [T]>, OverflowError>
Expand description
Wrap abstract fragments into lines with an optimal-fit algorithm.
The line_widths
slice gives the target line width for each line
(the last slice element is repeated as necessary). This can be
used to implement hanging indentation.
The fragments must already have been split into the desired widths, this function will not (and cannot) attempt to split them further when arranging them into lines.
§Optimal-Fit Algorithm
The algorithm considers all possible break points and picks the
breaks which minimizes the gaps at the end of each line. More
precisely, the algorithm assigns a cost or penalty to each break
point, determined by cost = gap * gap
where gap = target_width - line_width
. Shorter lines are thus penalized more heavily since
they leave behind a larger gap.
We can illustrate this with the text “To be, or not to be: that is the question”. We will be wrapping it in a narrow column with room for only 10 characters. The greedy algorithm will produce these lines, each annotated with the corresponding penalty:
"To be, or" 1² = 1
"not to be:" 0² = 0
"that is" 3² = 9
"the" 7² = 49
"question" 2² = 4
We see that line four with “the” leaves a gap of 7 columns, which gives it a penalty of 49. The sum of the penalties is 63.
There are 10 words, which means that there are 2_u32.pow(9)
or
512 different ways to typeset it. We can compute
the sum of the penalties for each possible line break and search
for the one with the lowest sum:
"To be," 4² = 16
"or not to" 1² = 1
"be: that" 2² = 4
"is the" 4² = 16
"question" 2² = 4
The sum of the penalties is 41, which is better than what the greedy algorithm produced.
Searching through all possible combinations would normally be prohibitively slow. However, it turns out that the problem can be formulated as the task of finding column minima in a cost matrix. This matrix has a special form (totally monotone) which lets us use a linear-time algorithm called SMAWK to find the optimal break points.
This means that the time complexity remains O(n) where n is
the number of words. Compared to
wrap_first_fit()
, this function is
about 4 times slower.
The optimization of per-line costs over the entire paragraph is inspired by the line breaking algorithm used in TeX, as described in the 1981 article Breaking Paragraphs into Lines by Knuth and Plass. The implementation here is based on Python code by David Eppstein.
§Errors
In case of an overflow during the cost computation, an Err
is
returned. Overflows happens when fragments or lines have infinite
widths (f64::INFINITY
) or if the widths are so large that the
gaps at the end of lines have sizes larger than f64::MAX.sqrt()
(approximately 1e154):
use textwrap::core::Fragment;
use textwrap::wrap_algorithms::{wrap_optimal_fit, OverflowError, Penalties};
#[derive(Debug, PartialEq)]
struct Word(f64);
impl Fragment for Word {
fn width(&self) -> f64 { self.0 }
fn whitespace_width(&self) -> f64 { 1.0 }
fn penalty_width(&self) -> f64 { 0.0 }
}
// Wrapping overflows because 1e155 * 1e155 = 1e310, which is
// larger than f64::MAX:
assert_eq!(wrap_optimal_fit(&[Word(0.0), Word(0.0)], &[1e155], &Penalties::default()),
Err(OverflowError));
When using fragment widths and line widths which fit inside an
u64
, overflows cannot happen. This means that fragments derived
from a &str
cannot cause overflows.
Note: Only available when the smawk
Cargo feature is
enabled.