# String Utilities in C++ A couple of utilities for improving string usability ## Join Concatenate the elements of a container with a joining token. Uses ostreams. ## Tokenizer/Split Split a string into a vector of strings. There are two different versions of the tokenizer: normal and escapable. The EscapableTokenizer cannot return string\_views, because it may have to doctor the contents. Provides the following features: ### Ignore Empty Tokens Discard any token which is the empty string, enabled by default. ``` c++ string_utils::Tokenizer split(","); std::string_view const input = "A,B,C,,D"; split(input); // [ "A", "B", "C", "D" ] split.ignore_empty_tokens(false); split(input); // [ "A", "B", "C", "", "D" ] ``` ### Max Outputs Limit the number of outputs returned, the default is _infinite_ (size_t::max). ``` c++ string_utils::Tokenizer split(","); std::string_view const input = "A,B,C,D"; split(input).size(); // 4 split.max_outputs(3); split(input).size(); // 3 ``` ### Truncate If there would be more tokens in the result than the maximum allowable, you can choose to either return all of the rest-tokens in the last token element, or return only the Nth concrete token. ``` c++ string_utils::Tokenizer split(","); split.max_outputs(3); std::string_view const input = "A,B,C,D"; split(input); // [ "A", "B", "C,D" ] split.truncate(true); split(input); // [ "A", "B", "C" ] ``` ### Reverse Search Order Instead of tokenizing the string from front-to-back, do it from back-to-front. ``` c++ string_utils::Tokenizer split(","); split.max_outputs(3); split.reverse_search(true); std::string_view const input = "A,B,C,D"; split(input); // [ "A,B", "C", "D" ] split.truncate(true); split(input); // [ "B", "C", "D" ] ``` ### Quotes By providing a special quote character (with an optional escape sequence), it is possible to parse more complicated expressions. This is useful for example with CSV data, as you may need to represent a comma inside one of the fields. In order to allow the regular tokenize to return a vector of string\_views, this is stored in a different class. ``` c++ string_utils::Tokenizer split(","); // CSVs use a quotation mark for quotes, and we'll define doubled quotes as an escaped quote string_utils::EscapableTokenizer esplit = split.escapable({'"', R"("")"}); std::string_view const input = R"(A,B,"C,D",""E"",F)"; esplit(input); // [ "A", "B", "C,D", "\"E\"", "F" ] ``` ## Cast - Coercing types from strings In GoogleMock, if you don't want to define an ostream operator for your type, you can define a function `PrintTo(T const &, std::ostream*)` in the same namespace as `T`. GoogleMock then uses ADL to find that function and use it to print out the formatted version. There are two different functions that are important: the singular token cast and the multi-token cast. ``` bool cast(T &out, std::string_view); bool cast(T &out, std::vector const &); ```