2/1/2022

Python R String

PEP:3101
Title:Advanced String Formatting
Author:Talin <viridia at gmail.com>
Status:Final
Type:Standards Track
Created:16-Apr-2006
Python-Version:3.0
Post-History:28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007, 14-Sep-2008

Contents

  • Specification
  1. Python string method rindex returns the last index where the substring str is found, or raises an exception if no such index exists, optionally restricting the search to stringbeg:end. Following is the syntax for rindex method −. Str.rindex(str, beg=0 end=len(string)) Parameters. Str − This specifies the string to be searched. Beg − This is the starting index.
  2. Dec 03, 2020 Python - Create a string made of the first and last two characters from a given string. String slicing in Python to check if a string can become empty.

Rsplit Parameters. Rsplit method takes maximum of 2 parameters: separator (optional)- The is a delimiter.rsplit method splits string starting from the right at the specified separator. What is String in Python? A string is a sequence of characters. A character is simply a symbol. For example, the English language has 26 characters. Computers do not deal with characters, they deal with numbers (binary). Even though you may see characters on your screen, internally it is stored and manipulated as a combination of 0s and 1s. Definition and Usage The rfind method finds the last occurrence of the specified value. The rfind method returns -1 if the value is not found. The rfind method.

This PEP proposes a new system for built-in string formattingoperations, intended as a replacement for the existing '%' stringformatting operator.

Python currently provides two methods of string interpolation:

  • The '%' operator for strings. [1]
  • The string.Template module. [2]

The primary scope of this PEP concerns proposals for built-instring formatting operations (in other words, methods of thebuilt-in string type).

The '%' operator is primarily limited by the fact that it is abinary operator, and therefore can take at most two arguments.One of those arguments is already dedicated to the format string,leaving all other variables to be squeezed into the remainingargument. The current practice is to use either a dictionary or atuple as the second argument, but as many people have commented[3], this lacks flexibility. The 'all or nothing' approach(meaning that one must choose between only positional arguments,or only named arguments) is felt to be overly constraining.

Python R String Vs F String

While there is some overlap between this proposal andstring.Template, it is felt that each serves a distinct need,and that one does not obviate the other. This proposal is fora mechanism which, like '%', is efficient for small stringswhich are only used once, so, for example, compilation of astring into a template is not contemplated in this proposal,although the proposal does take care to define format stringsand the API in such a way that an efficient template packagecould reuse the syntax and even some of the underlyingformatting code.

The specification will consist of the following parts:

  • Specification of a new formatting method to be added to thebuilt-in string class.
  • Specification of functions and flag values to be added tothe string module, so that the underlying formatting enginecan be used with additional options.
  • Specification of a new syntax for format strings.
  • Specification of a new set of special methods to control theformatting and conversion of objects.
  • Specification of an API for user-defined formatting classes.
  • Specification of how formatting errors are handled.

Note on string encodings: When discussing this PEP in the contextof Python 3.0, it is assumed that all strings are unicode strings,and that the use of the word 'string' in the context of thisdocument will generally refer to a Python 3.0 string, which isthe same as Python 2.x unicode object.

In the context of Python 2.x, the use of the word 'string' in thisdocument refers to an object which may either be a regular stringor a unicode object. All of the function call interfacesdescribed in this PEP can be used for both strings and unicodeobjects, and in all cases there is sufficient informationto be able to properly deduce the output string type (inother words, there is no need for two separate APIs).In all cases, the type of the format string dominates - thatis, the result of the conversion will always result in an objectthat contains the same representation of characters as theinput format string.

String Methods

The built-in string class (and also the unicode class in 2.6) willgain a new method, 'format', which takes an arbitrary number ofpositional and keyword arguments:

Python R String Not Working

Within a format string, each positional argument is identifiedwith a number, starting from zero, so in the above example, 'a' isargument 0 and 'b' is argument 1. Each keyword argument isidentified by its keyword name, so in the above example, 'c' isused to refer to the third argument.

There is also a global built-in function, 'format' which formatsa single value:

This function is described in a later section.

Format Strings

Format strings consist of intermingled character data and markup.

Character data is data which is transferred unchanged from theformat string to the output string; markup is not transferred fromthe format string directly to the output, but instead is used todefine 'replacement fields' that describe to the format enginewhat should be placed in the output string in place of the markup.

Brace characters ('curly braces') are used to indicate areplacement field within the string:

The result of this is the string:

Braces can be escaped by doubling:

Which would produce:

The element within the braces is called a 'field'. Fields consistof a 'field name', which can either be simple or compound, and anoptional 'format specifier'.

Simple and Compound Field Names

Simple field names are either names or numbers. If numbers, theymust be valid base-10 integers; if names, they must be validPython identifiers. A number is used to identify a positionalargument, while a name is used to identify a keyword argument.

A compound field name is a combination of multiple simple fieldnames in an expression:

This example shows the use of the 'getattr' or 'dot' operatorin a field expression. The dot operator allows an attribute ofan input value to be specified as the field value.

Unlike some other programming languages, you cannot embed arbitraryexpressions in format strings. This is by design - the types ofexpressions that you can use is deliberately limited. Only two operatorsare supported: the '.' (getattr) operator, and the '[]' (getitem)operator. The reason for allowing these operators is that they don'tnormally have side effects in non-pathological code.

An example of the 'getitem' syntax:

It should be noted that the use of 'getitem' within a format stringis much more limited than its conventional usage. In the above example,the string 'name' really is the literal string 'name', not a variablenamed 'name'. The rules for parsing an item key are very simple.If it starts with a digit, then it is treated as a number, otherwiseit is used as a string.

Because keys are not quote-delimited, it is not possible tospecify arbitrary dictionary keys (e.g., the strings '10' or':-]') from within a format string.

Implementation note: The implementation of this proposal isnot required to enforce the rule about a simple or dotted namebeing a valid Python identifier. Instead, it will rely on thegetattr function of the underlying object to throw an exception ifthe identifier is not legal. The str.format() function will havea minimalist parser which only attempts to figure out when it is'done' with an identifier (by finding a '.' or a ']', or '}',etc.).

Format Specifiers

Each field can also specify an optional set of 'formatspecifiers' which can be used to adjust the format of that field.Format specifiers follow the field name, with a colon (':')character separating the two:

The meaning and syntax of the format specifiers depends on thetype of object that is being formatted, but there is a standardset of format specifiers used for any object that does notoverride them.

Python

Format specifiers can themselves contain replacement fields.For example, a field whose field width is itself a parametercould be specified via:

These 'internal' replacement fields can only occur in the formatspecifier part of the replacement field. Internal replacement fieldscannot themselves have format specifiers. This implies also thatreplacement fields cannot be nested to arbitrary levels.

Note that the doubled '}' at the end, which would normally beescaped, is not escaped in this case. The reason is becausethe '{{' and '}}' syntax for escapes is only applied when usedoutside of a format field. Within a format field, the bracecharacters always have their normal meaning.

The syntax for format specifiers is open-ended, since a classcan override the standard format specifiers. In such cases,the str.format() method merely passes all of the characters betweenthe first colon and the matching brace to the relevant underlyingformatting method.

Standard Format Specifiers

If an object does not define its own format specifiers, a standardset of format specifiers is used. These are similar in concept tothe format specifiers used by the existing '%' operator, howeverthere are also a number of differences.

The general form of a standard format specifier is:

The brackets ([]) indicate an optional element.

Then the optional align flag can be one of the following:

Note that unless a minimum field width is defined, the fieldwidth will always be the same size as the data to fill it, sothat the alignment option has no meaning in this case.

The optional 'fill' character defines the character to be used topad the field to the minimum width. The fill character, if present,must be followed by an alignment flag.

The 'sign' option is only valid for numeric types, and can be oneof the following:

If the '#' character is present, integers use the 'alternate form'for formatting. This means that binary, octal, and hexadecimaloutput will be prefixed with '0b', '0o', and '0x', respectively.

Python R String Flag

'width' is a decimal integer defining the minimum field width. Ifnot specified, then the field width will be determined by thecontent.

Python r string literal

If the width field is preceded by a zero ('0') character, this enableszero-padding. This is equivalent to an alignment type of '=' and afill character of '0'.

The 'precision' is a decimal number indicating how many digitsshould be displayed after the decimal point in a floating pointconversion. For non-numeric types the field indicates the maximumfield size - in other words, how many characters will be used fromthe field content. The precision is ignored for integer conversions.

Finally, the 'type' determines how the data should be presented.

The available integer presentation types are:

The available floating point presentation types are:

Objects are able to define their own format specifiers toreplace the standard ones. An example is the 'datetime' class,whose format specifiers might look something like thearguments to the strftime() function:

For all built-in types, an empty format specification will producethe equivalent of str(value). It is recommended that objectsdefining their own format specifiers follow this convention aswell.

Explicit Conversion Flag

The explicit conversion flag is used to transform the format field valuebefore it is formatted. This can be used to override the type-specificformatting behavior, and format the value as if it were a moregeneric type. Currently, two explicit conversion flags arerecognized:

These flags are placed before the format specifier:

In the preceding example, the string 'Hello' will be printed, with quotes,in a field of at least 20 characters width.

A custom Formatter class can define additional conversion flags.The built-in formatter will raise a ValueError if an invalidconversion flag is specified.

Controlling Formatting on a Per-Type Basis

Each Python type can control formatting of its instances by defininga __format__ method. The __format__ method is responsible forinterpreting the format specifier, formatting the value, andreturning the resulting string.

The new, global built-in function 'format' simply calls this specialmethod, similar to how len() and str() simply call their respectivespecial methods:

It is safe to call this function with a value of 'None' (because the'None' value in Python is an object and can have methods.)

Several built-in types, including 'str', 'int', 'float', and 'object'define __format__ methods. This means that if you derive from any ofthose types, your class will know how to format itself.

The object.__format__ method is the simplest: It simply converts theobject to a string, and then calls format again:

The __format__ methods for 'int' and 'float' will do numeric formattingbased on the format specifier. In some cases, these formattingoperations may be delegated to other types. So for example, in the casewhere the 'int' formatter sees a format type of 'f' (meaning 'float')it can simply cast the value to a float and call format() again.

Any class can override the __format__ method to provide customformatting for that type:

Note for Python 2.x: The 'format_spec' argument will be eithera string object or a unicode object, depending on the type of theoriginal format string. The __format__ method should test the typeof the specifiers parameter to determine whether to return a string orunicode object. It is the responsibility of the __format__ methodto return an object of the proper type.

R In Python

Note that the 'explicit conversion' flag mentioned above is not passedto the __format__ method. Rather, it is expected that the conversionspecified by the flag will be performed before calling __format__.

User-Defined Formatting

There will be times when customizing the formatting of fieldson a per-type basis is not enough. An example might be aspreadsheet application, which displays hash marks '#' when a valueis too large to fit in the available space.

For more powerful and flexible formatting, access to the underlyingformat engine can be obtained through the 'Formatter' class thatlives in the 'string' module. This class takes additional optionswhich are not accessible via the normal str.format method.

An application can subclass the Formatter class to create its owncustomized formatting behavior.

The PEP does not attempt to exactly specify all methods andproperties defined by the Formatter class; instead, those will bedefined and documented in the initial implementation. However, thisPEP will specify the general requirements for the Formatter class,which are listed below.

Although string.format() does not directly use the Formatter classto do formatting, both use the same underlying implementation. Thereason that string.format() does not use the Formatter class directlyis because 'string' is a built-in type, which means that all of itsmethods must be implemented in C, whereas Formatter is a Pythonclass. Formatter provides an extensible wrapper around the sameC functions as are used by string.format().

Formatter Methods

The Formatter class takes no initialization arguments:

The public API methods of class Formatter are as follows:

'format' is the primary API method. It takes a format template,and an arbitrary set of positional and keyword arguments.'format' is just a wrapper that calls 'vformat'.

'vformat' is the function that does the actual work of formatting. Itis exposed as a separate function for cases where you want to pass ina predefined dictionary of arguments, rather than unpacking andrepacking the dictionary as individual arguments using the *args and**kwds syntax. 'vformat' does the work of breaking up the formattemplate string into character data and replacement fields. It callsthe 'get_positional' and 'get_index' methods as appropriate (describedbelow.)

Formatter defines the following overridable methods:

'get_value' is used to retrieve a given field value. The 'key' argumentwill be either an integer or a string. If it is an integer, it representsthe index of the positional argument in 'args'; If it is a string, thenit represents a named argument in 'kwargs'.

The 'args' parameter is set to the list of positional arguments to'vformat', and the 'kwargs' parameter is set to the dictionary ofpositional arguments.

For compound field names, these functions are only called for thefirst component of the field name; subsequent components are handledthrough normal attribute and indexing operations.

So for example, the field expression '0.name' would cause 'get_value'to be called with a 'key' argument of 0. The 'name' attribute will belooked up after 'get_value' returns by calling the built-in 'getattr'function.

If the index or keyword refers to an item that does not exist, then anIndexError/KeyError should be raised.

'check_unused_args' is used to implement checking for unused argumentsif desired. The arguments to this function is the set of all argumentkeys that were actually referred to in the format string (integers forpositional arguments, and strings for named arguments), and a referenceto the args and kwargs that was passed to vformat. The set of unusedargs can be calculated from these parameters. 'check_unused_args'is assumed to throw an exception if the check fails.

'format_field' simply calls the global 'format' built-in. The methodis provided so that subclasses can override it.

To get a better understanding of how these functions relate to eachother, here is pseudocode that explains the general operation ofvformat:

Note that the actual algorithm of the Formatter class (which will beimplemented in C) may not be the one presented here. (It's likelythat the actual implementation won't be a 'class' at all - rather,vformat may just call a C function which accepts the other overridablemethods as arguments.) The primary purpose of this code example is toillustrate the order in which overridable methods are called.

Customizing Formatters

This section describes some typical ways that Formatter objectscan be customized.

Python R String

Python R String Prefix

To support alternative format-string syntax, the 'vformat' methodcan be overridden to alter the way format strings are parsed.

One common desire is to support a 'default' namespace, so thatyou don't need to pass in keyword arguments to the format()method, but can instead use values in a pre-existing namespace.This can easily be done by overriding get_value() as follows:

One can use this to easily create a formatting function that allowsaccess to global variables, for example:

A similar technique can be done with the locals() dictionary togain access to the locals dictionary.

It would also be possible to create a 'smart' namespace formatterthat could automatically access both locals and globals throughsnooping of the calling stack. Due to the need for compatibilitywith the different versions of Python, such a capability will notbe included in the standard library, however it is anticipatedthat someone will create and publish a recipe for doing this.

Python R String Prefix

Another type of customization is to change the way that built-intypes are formatted by overriding the 'format_field' method. (Fornon-built-in types, you can simply define a __format__ specialmethod on that type.) So for example, you could override theformatting of numbers to output scientific notation when needed.

Error handling

There are two classes of exceptions which can occur during formatting:exceptions generated by the formatter code itself, and exceptionsgenerated by user code (such as a field object's 'getattr' function).

In general, exceptions generated by the formatter code itself areof the 'ValueError' variety -- there is an error in the actual 'value'of the format string. (This is not always true; for example, thestring.format() function might be passed a non-string as its firstparameter, which would result in a TypeError.)

The text associated with these internally generated ValueErrorexceptions will indicate the location of the exception insidethe format string, as well as the nature of the exception.

For exceptions generated by user code, a trace record anddummy frame will be added to the traceback stack to helpin determining the location in the string where the exceptionoccurred. The inserted traceback will indicate that theerror occurred at:

where XX and YY represent the line and character positioninformation in the string, respectively.

Naturally, one of the most contentious issues is the syntax of theformat strings, and in particular the markup conventions used toindicate fields.

Rather than attempting to exhaustively list all of the variousproposals, I will cover the ones that are most widely usedalready.

  • Shell variable syntax: $name and $(name) (or in some variants,${name}). This is probably the oldest convention out there, andis used by Perl and many others. When used without the braces,the length of the variable is determined by lexically scanninguntil an invalid character is found.

    This scheme is generally used in cases where interpolation isimplicit - that is, in environments where any string can containinterpolation variables, and no special substitution functionneed be invoked. In such cases, it is important to prevent theinterpolation behavior from occurring accidentally, so the '$'(which is otherwise a relatively uncommonly-used character) isused to signal when the behavior should occur.

    It is the author's opinion, however, that in cases where theformatting is explicitly invoked, that less care needs to betaken to prevent accidental interpolation, in which case alighter and less unwieldy syntax can be used.

  • printf and its cousins ('%'), including variations that add afield index, so that fields can be interpolated out of order.

  • Other bracket-only variations. Various MUDs (Multi-UserDungeons) such as MUSH have used brackets (e.g. [name]) to dostring interpolation. The Microsoft .Net libraries uses braces({}), and a syntax which is very similar to the one in thisproposal, although the syntax for format specifiers is quitedifferent. [4]

  • Backquoting. This method has the benefit of minimal syntacticalclutter, however it lacks many of the benefits of a functioncall syntax (such as complex expression arguments, customformatters, etc.).

  • Other variations include Ruby's #{}, PHP's {$name}, and soon.

Some specific aspects of the syntax warrant additional comments:

1) Backslash character for escapes. The original version ofthis PEP used backslash rather than doubling to escape a bracket.This worked because backslashes in Python string literals thatdon't conform to a standard backslash sequence such as nare left unmodified. However, this caused a certain amountof confusion, and led to potential situations of multiplerecursive escapes, i.e. { to place a literal backslashin front of a bracket.

2) The use of the colon character (':') as a separator forformat specifiers. This was chosen simply because that'swhat .Net uses.

Restricting attribute access: An earlier version of the PEPrestricted the ability to access attributes beginning with aleading underscore, for example '{0}._private'. However, thisis a useful ability to have when debugging, so the featurewas dropped.

Some developers suggested that the ability to do 'getattr' and'getitem' access should be dropped entirely. However, thisis in conflict with the needs of another set of developers whostrongly lobbied for the ability to pass in a large dict as asingle argument (without flattening it into individual keywordarguments using the **kwargs syntax) and then have the formatstring refer to dict entries individually.

There has also been suggestions to expand the set of expressionsthat are allowed in a format string. However, this was seento go against the spirit of TOOWTDI, since the same effect canbe achieved in most cases by executing the same expression onthe parameter before it's passed in to the formatting function.For cases where the format string is being use to do arbitraryformatting in a rules='none'>[1]Python Library Reference - String formating operationshttp://docs.python.org/library/stdtypes.html#string-formatting-operations
[2]Python Library References - Template stringshttp://docs.python.org/library/string.html#string.Template
[3][Python-3000] String formating operations in python 3khttps://mail.python.org/pipermail/python-3000/2006-April/000285.html
[4]Composite Formatting - [.Net Framework Developer's Guide]http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
[5]Genshi templating engine.http://genshi.edgewall.org/
[6]Cheetah - The Python-Powered Template Engine.http://www.cheetahtemplate.org/

This document has been placed in the public domain.

Source: https://github.com/python/peps/blob/master/pep-3101.txt