To string_view, or not to string_view
Published on: 2024-05-03
A look at the string_view type to better understand when, where, and why it should (or should not) be used.
By Tom Hulton-Harrop
Motivation
At first glance, string_view
seems like an incredibly useful type that can be used in a wide variety of situations. Unfortunately, there are some sharp corners to string_view
which need to be understood to use it effectively. This post hopes to give some guidance around the use of string_view
to ensure our code is clear, efficient and secure.
Discussion
Use string_view
as a parameter or local variable only
If you remember one thing about string_view
from this post, this is the most important. string_view
should be used (almost) exclusively as a function parameter or local variable (see the small caveat below as to why it’s ‘almost’).
It’s also valid to use string_view
as a constant global variable (preferably marked constexpr
), as here there are no lifetime issues. The string_view
is almost certainly initialized from a const char*
, which exists for the lifetime of the application. However, this only really makes sense if it will be used by an API expecting a string_view
. If you must interface with a legacy API that expects a const char*
, it’s preferable to store a global const char* const
type instead.
If you have a routine that needs to do some string manipulation and does not call any other functions expecting a null terminated character sequence (a const char*
), you’re golden. As a quick illustration, here is an example of some code to parse a string and extract several numbers from the sequence (taken from an Advent of Code puzzle).
Example input line:
"departure location: 47-874 or 885-960"
A function to extract digits:
struct range_t { int64_t begin; int64_t end;};
const auto range_fn_str = [](int identifier_offset, const std::string& line) { auto ranges = line.substr(identifier_offset + 1, line.size()); auto start = ranges.find(' '); auto mid = ranges.find('-'); auto end = ranges.find(' ', start + 1); auto left = ranges.substr(start + 1, mid - start); auto right = ranges.substr(mid + 1, end - mid); int64_t begin_number = 0, end_number = 0; std::from_chars(left.data(), left.data() + left.size(), begin_number); std::from_chars(right.data(), right.data() + right.size(), end_number); return range_t{begin_number, end_number};};
// calling codeif (auto identifier = line.find(':'); identifier != std::string::npos) { ranges.push_back(range_fn_str(identifier, line));}
The code isn’t particularly clever, and it’s not even really important to understand what it’s doing, what we should focus on are the calls to substr
. With std::string
, a call to substr
will return a copy of the sub-string (another string
). If the string is small, we’ll benefit from the small string optimization (SSO), but with a longer string we have to go to the heap to allocate memory and construct a new object. When the variable goes out of scope, we then have to destroy and deallocate it. If we change the string
function parameter to instead accept a string_view
, substr
no longer returns a string
, but a string_view
1. This involves no heap allocations at all. We are simply adjusting a pointer and a size value under the hood. If ownership of the string does not change (and we aren’t storing the string_view
anywhere), we can safely use these much more efficient functions. The initial string
version is actually a pessimization of what we need. To show what an impact this can have, let’s compare each version using QuickBench. The string_view
version performs 1.8x faster than the string
version simply by changing a single type. This is where string_view
really shines.
string_view
and null terminated strings
string_view
is not guaranteed to be null terminated. Any code that assumes string_view
is null terminated is fragile, bug prone and dangerous. If a situation arises where a string_view
needs to be passed to a legacy function, the safest option is to first construct a string
from it using an explicit string
constructor, and then call .c_str()
on that instance.
Aside: A better option would be to update the function accepting a
string_view
and calling a legacy API to take either astring
orconst char*
, depending on the situation. It is possible to useconst string&
, however this inhibits the compiler from moving strings that don’t use the short string optimization. Accepting strings by value allows efficient moving of values. If null-termination is required and the string doesn’t need ownership, it would be better to accept aconst char*
, instead of aconst string&
, because theconst string&
can create an unnecessary heap allocation when thestring
has to be constructed.
One source of confusion and complexity around string_view
is if we construct a string_view
from a string literal like so:
std::string_view greeting = "good morning";
This specific instance of string_view
will be null terminated, purely by the fact that it happens to point to a complete C string.
Quoting from the C++ standard:
5.13.5 String literals [lex.string]
…
14. After any necessary concatenation, in translation phase 7 (5.2), ’\0’ is appended to every string literal so that programs that scan a string can find its end.
Relying on this is incredibly dangerous though, as in future greeting
could be updated to be constructed in another way where the null terminator is not present. We must assume the null terminator is never there when dealing with string_view
. When working with existing library code (e.g. atoi
and stoi
), string_view
is best avoided entirely.
The C++ standard also has this to say:
24.4.2.4 Element access [string.view.access]
…
14. [Note: Unlike basic_string::data() and string literals, data() may return a pointer to a buffer that is not null-terminated. Therefore it is typically a mistake to pass data() to a function that takes just a const charT* and expects a null-terminated string. — end note]
It is important to understand why a string_view
cannot provide a c_str()
member function. A string_view
is a non-owning view of a string
(or string like object). As it is only a view, it cannot change the underlying data in any way. If we were to create a string_view
from part of an existing string
, there is no way to insert a null terminator without mutating the underlying string
.
In the example below we can create a view of hello_world
, but can’t do anything to change it. If we wish to create a null terminated character sequence, we must make a copy (std::string hello = std::string(hello_view);
), and then call c_str()
explicitly.
std::string hello_world = "hello, world!";std::string_view hello_view = hello_world; // must not outlive hello_worldhello_view.remove_suffix(8);std::cout << hello_view << '\n'; // worksprintf("%s\n", hello_view.data()); // breaks
Output:
hellohello, world!
# "hello, world!# ^# # can't add a null terminator here without modifying hello_world
See this Compiler Explorer snippet for a live demo.
There is no getting around needing to do some work in these situations, which is why it’s best not to mix use of string_view
with (legacy) APIs expecting null terminated strings. If you have a legacy API that you can’t change that expects a null terminated string, string_view
may not be the right tool for the job.
Consider string_view
lifetimes carefully
string_view
is an easy type to misuse. On the surface it appears to provide the semantics of a value type. It can be cheaply copied and passed around without much overhead. Comparing two string_views
is like comparing two strings. The problem comes when we consider lifetimes. string_view
does not own the memory it refers to so it is incredibly easy to run into lifetime problems.
In general, unless there is a very good reason to, do not return a string_view
from a function. It is much safer to either return a string
by value (compilers are super smart at taking advantage of (Named) Return Value Optimization ((N)RVO)), or a const string&
, if returning the value of a member variable in a class.
Instead of thinking of string_view
as a value type, it’s better to think of it as a borrow type. Anytime you see one you should be absolutely sure that it won’t outlive the scope of the value it’s borrowing from. C++ unfortunately does not do a great job of protecting us from this (unlike Rust, which has made avoiding this type of issue significantly easier with the borrow checker).
A gotcha worth mentioning when it comes to string_view
and how it differs from references is lifetime extension. To demonstrate this let’s look at a small example:
class my_str : public std::string { public: // annotate constructor/destructor calls my_str(const std::string& str) : std::string(str) { std::cout << "my_str ctr\n"; } ~my_str() { std::cout << "my_str dstr\n"; }};
std::string get_hello() { return my_str("A very long string that is not going to be able to use SSO");}
int main(int, char**) { std::string_view hello = get_hello(); std::cout << hello << "\n";}
To highlight when the constructor and destructor are called, we’ve publicly inherited from string
to annotate the lifetimes. When we call get_hello
, a temporary string is created and then immediately destroyed at the closing semicolon. The string_view
hello
, is now a dangling reference to this temporary. If we try and print hello
, we’ll get a random value printed (we’re now in UB territory so who knows what might happen).
As we’ve included the logging for the constructor and destructor, we can see both are called before the string is printed.
my_str ctrmy_str dtr�T8�¬Y�*�ng that is not going to be able to use SSO
If instead we’d used an old fashioned const string&
, then lifetime extension would have kicked in, ensuring hello
lives as long as the enclosing scope.
my_str ctrA very long string that is not going to be able to use SSOmy_str dtr
See this Compiler Explorer snippet for a live demo.
As we can see from this example, it’s possible to get into trouble when using string_view
even as a local variable. One solution is to rely on auto
to deduce the correct types (string_view
member functions will return cheap string_view
objects, and for other functions we can use auto
or const auto&
to do the right thing). This is at the authors discretion but can have some real world advantages.
Deliberation
string_view
is a useful type, but perhaps not as ubiquitous as initially thought. Consider the drawbacks and limitations very carefully when deciding whether to use it or not. string_view
is not a drop-in replacement for string
or const char*
. Think very carefully about the use-cases and consequences of applying it.
A good approach can be to start at leaf functions (functions with no calls of their own) that deal with string manipulation, and work your way up. This has the benefit of not dropping a string_view
in the middle of a call-chain that needs to then be converted back into a std::string
.
The C++ Core Guidelines have some interesting guidance as well. If you do really need a null terminated string, instead of using const char*
to denote that, use a type alias such as zstring
or czstring
, to explicitly indicate this is a null/zero terminated string. This acts to disambiguate a const char*
that might be pointing to a single character, or a character array without a null terminator (e.g. char hello[5] = { 'h', 'e', 'l', 'l', 'o'};
).
See this Compiler Explorer example.
Further Reading
There are a ton of great sources available about string_view
.
Talks
- CppCon 2018: Victor Ciura “Enough string_view to Hang Ourselves” - Probably the best and most balanced look at
string_view
. - CppCon 2015: Marshall Clow “string_view” - A good intro and overview (though there’s a lot of audience questions/interruptions for this talk).
- StringViews, StringViews everywhere! - Marc Mutz - Meeting C++ 2017 - KDAB/Qt contributor talking about the use of
string_view
andQStringView
- CppCon 2017: Nicolai Josuttis “C++ Templates Revised” - Referenced in Victor Ciura’s talk, references
string_view
and some of the pitfalls of using it. - CppCon 2015: Neil MacIntosh “Evolving array_view and string_view for safe C++ code” - An interesting look at why
string_view
andarray_view
(span
) types exist, and how they should be used.
Articles
- c_str-correctness - An excellent look at why when interfacing with legacy C code (expecting null-terminated strings)
string_view
is a bad idea. std::string_view
is a borrow type - Astring_view
introduction, introducing the concept of a ‘borrow’ type.std::string_view
accepting temporaries: good idea or horrible pitfall? - Deep dive into some of the design implications ofstring_view
.- SL.str: String (C++ Core Guidelines) - Looks at string types as a whole (also references useful types in the Guideline Support Library (GSL)).
- C++ 17 Standard (Working Draft before final publication) - If you really want the details, go to the source.
Bonus
In researching this article I learned of many interesting pitfalls when it comes to working with character arrays in general. Here’s a brief collection of some of the gnarlier ones.
strncpy
is awful
I stumbled across this article which has nothing to do with string_view
, but lots of interesting information about C style strings. One gotcha that’s easy to hit is the fact it’s possible to wind-up with non-null terminated strings when using strncpy
. The documentation states:
If
count
is reached before the entire stringsrc
was copied, the resulting character array is not null-terminated.
Which means to be safe, we should always add dst[count - 1] = '\0';
, otherwise the potential for reading past the end of a buffer is high.
Compiler Explorer link
C++ fixed a horrible problem in C
In C, it is legal to write const char hello[5] = "hello";
which results in a non-null terminated string. The good news is this is now actually a compile time error in C++.
From the C++ Standard:
11.6.2 Character arrays [dcl.init.string]
…
2. There shall not be more initializers than there are array elements. [ Example: char cv[4] = “asdf”; // error is ill-formed since there is no space for the implied trailing ’\0’. — end example ]
3. If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized (11.6).
string_view
and string
string
provides an implicit conversion operator to string_view
, so it can be passed directly to a function expecting a string_view
(string_view
does not have a constructor that accepts a string
). The string_view
itself will most likely be constructed using the public API of string
(it calls data()
on the string
which itself is guaranteed to be null terminated). Therefore it is technically possible to infer a string_view
may be null terminated in this case, but again there’s no guarantee, and no promise it won’t change in future, so attempting to rely on these details is risky, error prone and best avoided.
Footnotes
-
string_view
intentionally has an API very close to that ofstring
with a couple of useful extras such asremove_prefix
andremove_suffix
. ↩