|
|
2 months ago | |
|---|---|---|
| include | 9 months ago | |
| src | 2 years ago | |
| string-utils.xcodeproj | 2 months ago | |
| string_utils-test | 5 years ago | |
| test | 2 years ago | |
| .clang-format | 4 years ago | |
| README.md | 2 years ago |
A couple of utilities for improving string usability
Concatenate the elements of a container with a joining token. Uses ostreams.
Split a string into a vector of strings. There are two different versions of the tokenizer: normal and escapable. The EscapableTokenizer cannot return string_views, because it may have to doctor the contents.
Provides the following features:
Discard any token which is the empty string, enabled by default.
string_utils::Tokenizer split(",");
std::string_view const input = "A,B,C,,D";
split(input); // [ "A", "B", "C", "D" ]
split.ignore_empty_tokens(false);
split(input); // [ "A", "B", "C", "", "D" ]
Limit the number of outputs returned, the default is infinite (size_t::max).
string_utils::Tokenizer split(",");
std::string_view const input = "A,B,C,D";
split(input).size(); // 4
split.max_outputs(3);
split(input).size(); // 3
If there would be more tokens in the result than the maximum allowable, you can choose to either return all of the rest-tokens in the last token element, or return only the Nth concrete token.
string_utils::Tokenizer split(",");
split.max_outputs(3);
std::string_view const input = "A,B,C,D";
split(input); // [ "A", "B", "C,D" ]
split.truncate(true);
split(input); // [ "A", "B", "C" ]
Instead of tokenizing the string from front-to-back, do it from back-to-front.
string_utils::Tokenizer split(",");
split.max_outputs(3);
split.reverse_search(true);
std::string_view const input = "A,B,C,D";
split(input); // [ "A,B", "C", "D" ]
split.truncate(true);
split(input); // [ "B", "C", "D" ]
By providing a special quote character (with an optional escape sequence), it is possible to parse more complicated expressions. This is useful for example with CSV data, as you may need to represent a comma inside one of the fields.
In order to allow the regular tokenize to return a vector of string_views, this is stored in a different class.
string_utils::Tokenizer split(",");
// CSVs use a quotation mark for quotes, and we'll define doubled quotes as an escaped quote
string_utils::EscapableTokenizer esplit = split.escapable({'"', R"("")"});
std::string_view const input = R"(A,B,"C,D",""E"",F)";
esplit(input); // [ "A", "B", "C,D", "\"E\"", "F" ]
In GoogleMock, if you don't want to define an ostream operator for your type, you can define a function PrintTo(T const &, std::ostream*) in the same namespace as T. GoogleMock then uses ADL to find that function and use it to print out the formatted version.
There are two different functions that are important: the singular token cast and the multi-token cast.
bool cast(T &out, std::string_view);
bool cast(T &out, std::vector<std::string_view> const &);