python-shellwords

Parse a string into words like a POSIX shell does.

When using Python 2.3 or later, you may want to use the standard module shlex instead of shellwords. Simply use this code:

import shlex
words = shlex.split(input_line)

Why this module?

Out in the wild there are quite a few modules for executing commands in a sub-process. Most of them take a string (the command-line) as input and use os.exec* for executing the program. This requires splitting the string into words exacly like the shell does. If this parsing/splitting is incorrect you can have quite some funny time debugging. ;-) Since I didn’t find found any other module that-like, I decided to develop this module.

Benefits using this module

Using this module has the following benefits:

  • Enables your application or module to mimic word splitting like a POSIX shell without effort.

  • Saves yourself debugginf-time then doing word-splitting.

  • Avoids confusin the users of you application/module when splitting shell command lines into words, since this module behaves exactly like a POSIX shell does.

  • The Unittest-Suite proves the correct word-splitting. Currently 75 command lines are used, each testing a special pattern. The input data for this test-suite consists of command-lines which are split ba the shell on-fly fly. You can add your own test-patterns without any hassle.

Semantics

This module parses a string into words according to the parings-rules of a POSIX shell. These parsing rules are (quoted after ‘man bash’):

  1. Words are split at whitespace charakters; these are Space, Tab, Newline, Carriage-Return, Vertival-Tab (0B) and Form-Feet (0C).

    NB: Quotes do not separate words! Thus :: “My”Fancy”Computer” will be parsed into a single word: :: MyFancyComputer

  2. A non-quoted backslash () is the escape character. It preserves the literal value of the next character that follows.

  3. Enclosing characters in single quotes preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.

    This means: baskslash () has no special meaning within single quotes. All charakters within single quotes are taken as-is.

  4. Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of . The backslash retains its special meaning only when followed ” or . A double quote may be quoted within double quotes by preceding it with a backslash.

Frequently Asked Questions

Q

Hey, there is ‘shlex’ coming with Python. Why there is a need for this module?

A

I know ‘shlex’ and I gave it a try. But ‘shlex’ takes quotes as word-delemiters which divers from the shell-semantic (see above). And even if ‘shlex’ would parse strings as needed, I would have written a (very, very) thin layer above, since ‘shlex’ is simple but seldomly used for this kind of job.

Requirements

Requires Python >= 2.0

Download

Release 0.2 (current)
Release 0.1