To string_view, or not to string_view

Published on: 2024-05-03

A look at the string_view type to better understand when, where, and why it should (or should not) be used.

By Tom Hulton-Harrop

Motivation

At first glance, string_view seems like an incredibly useful type that can be used in a wide variety of situations. Unfortunately, there are some sharp corners to string_view which need to be understood to use it effectively. This post hopes to give some guidance around the use of string_view to ensure our code is clear, efficient and secure.

Discussion

Use string_view as a parameter or local variable only

If you remember one thing about string_view from this post, this is the most important. string_view should be used (almost) exclusively as a function parameter or local variable (see the small caveat below as to why it’s ‘almost’).

It’s also valid to use string_view as a constant global variable (preferably marked constexpr), as here there are no lifetime issues. The string_view is almost certainly initialized from a const char*, which exists for the lifetime of the application. However, this only really makes sense if it will be used by an API expecting a string_view. If you must interface with a legacy API that expects a const char*, it’s preferable to store a global const char* const type instead.

If you have a routine that needs to do some string manipulation and does not call any other functions expecting a null terminated character sequence (a const char*), you’re golden. As a quick illustration, here is an example of some code to parse a string and extract several numbers from the sequence (taken from an Advent of Code puzzle).

Example input line:

"departure location: 47-874 or 885-960"

A function to extract digits:

struct range_t {
int64_t begin;
int64_t end;
};
const auto range_fn_str = [](int identifier_offset, const std::string& line) {
auto ranges = line.substr(identifier_offset + 1, line.size());
auto start = ranges.find(' ');
auto mid = ranges.find('-');
auto end = ranges.find(' ', start + 1);
auto left = ranges.substr(start + 1, mid - start);
auto right = ranges.substr(mid + 1, end - mid);
int64_t begin_number = 0, end_number = 0;
std::from_chars(left.data(), left.data() + left.size(), begin_number);
std::from_chars(right.data(), right.data() + right.size(), end_number);
return range_t{begin_number, end_number};
};
// calling code
if (auto identifier = line.find(':'); identifier != std::string::npos) {
    ranges.push_back(range_fn_str(identifier, line));
}

The code isn’t particularly clever, and it’s not even really important to understand what it’s doing, what we should focus on are the calls to substr. With std::string, a call to substr will return a copy of the sub-string (another string). If the string is small, we’ll benefit from the small string optimization (SSO), but with a longer string we have to go to the heap to allocate memory and construct a new object. When the variable goes out of scope, we then have to destroy and deallocate it. If we change the string function parameter to instead accept a string_view, substr no longer returns a string, but a string_view 1. This involves no heap allocations at all. We are simply adjusting a pointer and a size value under the hood. If ownership of the string does not change (and we aren’t storing the string_view anywhere), we can safely use these much more efficient functions. The initial string version is actually a pessimization of what we need. To show what an impact this can have, let’s compare each version using QuickBench. The string_view version performs 1.8x faster than the string version simply by changing a single type. This is where string_view really shines.

string_view and null terminated strings

string_view is not guaranteed to be null terminated. Any code that assumes string_view is null terminated is fragile, bug prone and dangerous. If a situation arises where a string_view needs to be passed to a legacy function, the safest option is to first construct a string from it using an explicit string constructor, and then call .c_str() on that instance.

Aside: A better option would be to update the function accepting a string_view and calling a legacy API to take either a string or const char*, depending on the situation. It is possible to use const string&, however this inhibits the compiler from moving strings that don’t use the short string optimization. Accepting strings by value allows efficient moving of values. If null-termination is required and the string doesn’t need ownership, it would be better to accept a const char*, instead of a const string&, because the const string& can create an unnecessary heap allocation when the string has to be constructed.

One source of confusion and complexity around string_view is if we construct a string_view from a string literal like so:

std::string_view greeting = "good morning";

This specific instance of string_view will be null terminated, purely by the fact that it happens to point to a complete C string.

Quoting from the C++ standard:

5.13.5 String literals [lex.string]

14. After any necessary concatenation, in translation phase 7 (5.2), ’\0’ is appended to every string literal so that programs that scan a string can find its end.

Relying on this is incredibly dangerous though, as in future greeting could be updated to be constructed in another way where the null terminator is not present. We must assume the null terminator is never there when dealing with string_view. When working with existing library code (e.g. atoi and stoi), string_view is best avoided entirely.

The C++ standard also has this to say:

24.4.2.4 Element access [string.view.access]

14. [Note: Unlike basic_string::data() and string literals, data() may return a pointer to a buffer that is not null-terminated. Therefore it is typically a mistake to pass data() to a function that takes just a const charT* and expects a null-terminated string. — end note]

It is important to understand why a string_view cannot provide a c_str() member function. A string_view is a non-owning view of a string (or string like object). As it is only a view, it cannot change the underlying data in any way. If we were to create a string_view from part of an existing string, there is no way to insert a null terminator without mutating the underlying string.

In the example below we can create a view of hello_world, but can’t do anything to change it. If we wish to create a null terminated character sequence, we must make a copy (std::string hello = std::string(hello_view);), and then call c_str() explicitly.

std::string hello_world = "hello, world!";
std::string_view hello_view = hello_world; // must not outlive hello_world
hello_view.remove_suffix(8);
std::cout << hello_view << '\n'; // works
printf("%s\n", hello_view.data()); // breaks

Output:

Terminal window
hello
hello, world!
# "hello, world!
# ^
# # can't add a null terminator here without modifying hello_world

See this Compiler Explorer snippet for a live demo.

There is no getting around needing to do some work in these situations, which is why it’s best not to mix use of string_view with (legacy) APIs expecting null terminated strings. If you have a legacy API that you can’t change that expects a null terminated string, string_view may not be the right tool for the job.

Consider string_view lifetimes carefully

string_view is an easy type to misuse. On the surface it appears to provide the semantics of a value type. It can be cheaply copied and passed around without much overhead. Comparing two string_views is like comparing two strings. The problem comes when we consider lifetimes. string_view does not own the memory it refers to so it is incredibly easy to run into lifetime problems.

In general, unless there is a very good reason to, do not return a string_view from a function. It is much safer to either return a string by value (compilers are super smart at taking advantage of (Named) Return Value Optimization ((N)RVO)), or a const string&, if returning the value of a member variable in a class.

Instead of thinking of string_view as a value type, it’s better to think of it as a borrow type. Anytime you see one you should be absolutely sure that it won’t outlive the scope of the value it’s borrowing from. C++ unfortunately does not do a great job of protecting us from this (unlike Rust, which has made avoiding this type of issue significantly easier with the borrow checker).

A gotcha worth mentioning when it comes to string_view and how it differs from references is lifetime extension. To demonstrate this let’s look at a small example:

class my_str : public std::string {
public:
// annotate constructor/destructor calls
my_str(const std::string& str) : std::string(str) {
std::cout << "my_str ctr\n";
}
~my_str() { std::cout << "my_str dstr\n"; }
};
std::string get_hello() {
return my_str("A very long string that is not going to be able to use SSO");
}
int main(int, char**) {
std::string_view hello = get_hello();
std::cout << hello << "\n";
}

To highlight when the constructor and destructor are called, we’ve publicly inherited from string to annotate the lifetimes. When we call get_hello, a temporary string is created and then immediately destroyed at the closing semicolon. The string_view hello, is now a dangling reference to this temporary. If we try and print hello, we’ll get a random value printed (we’re now in UB territory so who knows what might happen).

As we’ve included the logging for the constructor and destructor, we can see both are called before the string is printed.

Terminal window
my_str ctr
my_str dtr
�T8�¬Y�*�ng that is not going to be able to use SSO

If instead we’d used an old fashioned const string&, then lifetime extension would have kicked in, ensuring hello lives as long as the enclosing scope.

Terminal window
my_str ctr
A very long string that is not going to be able to use SSO
my_str dtr

See this Compiler Explorer snippet for a live demo.

As we can see from this example, it’s possible to get into trouble when using string_view even as a local variable. One solution is to rely on auto to deduce the correct types (string_view member functions will return cheap string_view objects, and for other functions we can use auto or const auto& to do the right thing). This is at the authors discretion but can have some real world advantages.

Deliberation

string_view is a useful type, but perhaps not as ubiquitous as initially thought. Consider the drawbacks and limitations very carefully when deciding whether to use it or not. string_view is not a drop-in replacement for string or const char*. Think very carefully about the use-cases and consequences of applying it.

A good approach can be to start at leaf functions (functions with no calls of their own) that deal with string manipulation, and work your way up. This has the benefit of not dropping a string_view in the middle of a call-chain that needs to then be converted back into a std::string.

The C++ Core Guidelines have some interesting guidance as well. If you do really need a null terminated string, instead of using const char* to denote that, use a type alias such as zstring or czstring, to explicitly indicate this is a null/zero terminated string. This acts to disambiguate a const char* that might be pointing to a single character, or a character array without a null terminator (e.g. char hello[5] = { 'h', 'e', 'l', 'l', 'o'};).

See this Compiler Explorer example.

Further Reading

There are a ton of great sources available about string_view.

Talks

Articles

Bonus

In researching this article I learned of many interesting pitfalls when it comes to working with character arrays in general. Here’s a brief collection of some of the gnarlier ones.

strncpy is awful

I stumbled across this article which has nothing to do with string_view, but lots of interesting information about C style strings. One gotcha that’s easy to hit is the fact it’s possible to wind-up with non-null terminated strings when using strncpy. The documentation states:

If count is reached before the entire string src was copied, the resulting character array is not null-terminated.

Which means to be safe, we should always add dst[count - 1] = '\0';, otherwise the potential for reading past the end of a buffer is high.

Compiler Explorer link

C++ fixed a horrible problem in C

In C, it is legal to write const char hello[5] = "hello"; which results in a non-null terminated string. The good news is this is now actually a compile time error in C++.

From the C++ Standard:

11.6.2 Character arrays [dcl.init.string]

2. There shall not be more initializers than there are array elements. [ Example: char cv[4] = “asdf”; // error is ill-formed since there is no space for the implied trailing ’\0’. — end example ]
3. If there are fewer initializers than there are array elements, each element not explicitly initialized shall be zero-initialized (11.6).

string_view and string

string provides an implicit conversion operator to string_view, so it can be passed directly to a function expecting a string_view (string_view does not have a constructor that accepts a string). The string_view itself will most likely be constructed using the public API of string (it calls data() on the string which itself is guaranteed to be null terminated). Therefore it is technically possible to infer a string_view may be null terminated in this case, but again there’s no guarantee, and no promise it won’t change in future, so attempting to rely on these details is risky, error prone and best avoided.

Footnotes

  1. string_view intentionally has an API very close to that of string with a couple of useful extras such as remove_prefix and remove_suffix.

Up arrow