This is ../../info/elisp, produced by makeinfo version 4.11 from
elisp.texi.

This is edition 3.0 of the GNU Emacs Lisp Reference Manual,
corresponding to Emacs version 23.2.

   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1998, 1999,
2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010  Free
Software Foundation, Inc.

     Permission is granted to copy, distribute and/or modify this
     document under the terms of the GNU Free Documentation License,
     Version 1.3 or any later version published by the Free Software
     Foundation; with the Invariant Sections being "GNU General Public
     License," with the Front-Cover texts being "A GNU Manual," and
     with the Back-Cover Texts as in (a) below.  A copy of the license
     is included in the section entitled "GNU Free Documentation
     License."

     (a) The FSF's Back-Cover Text is: "You have the freedom to copy and
     modify this GNU manual.  Buying copies from the FSF supports it in
     developing GNU and promoting software freedom."

INFO-DIR-SECTION Emacs
START-INFO-DIR-ENTRY
* Elisp: (elisp).       The Emacs Lisp Reference Manual.
END-INFO-DIR-ENTRY


File: elisp,  Node: Buffer Contents,  Next: Comparing Text,  Prev: Near Point,  Up: Text

32.2 Examining Buffer Contents
==============================

This section describes functions that allow a Lisp program to convert
any portion of the text in the buffer into a string.

 -- Function: buffer-substring start end
     This function returns a string containing a copy of the text of the
     region defined by positions START and END in the current buffer.
     If the arguments are not positions in the accessible portion of
     the buffer, `buffer-substring' signals an `args-out-of-range'
     error.

     It is not necessary for START to be less than END; the arguments
     can be given in either order.  But most often the smaller argument
     is written first.

     Here's an example which assumes Font-Lock mode is not enabled:

          ---------- Buffer: foo ----------
          This is the contents of buffer foo

          ---------- Buffer: foo ----------

          (buffer-substring 1 10)
               => "This is t"
          (buffer-substring (point-max) 10)
               => "he contents of buffer foo\n"

     If the text being copied has any text properties, these are copied
     into the string along with the characters they belong to.  *Note
     Text Properties::.  However, overlays (*note Overlays::) in the
     buffer and their properties are ignored, not copied.

     For example, if Font-Lock mode is enabled, you might get results
     like these:

          (buffer-substring 1 10)
               => #("This is t" 0 1 (fontified t) 1 9 (fontified t))

 -- Function: buffer-substring-no-properties start end
     This is like `buffer-substring', except that it does not copy text
     properties, just the characters themselves.  *Note Text
     Properties::.

 -- Function: filter-buffer-substring start end &optional delete noprops
     This function passes the buffer text between START and END through
     the filter functions specified by the variable
     `buffer-substring-filters', and returns the value from the last
     filter function.  If `buffer-substring-filters' is `nil', the
     value is the unaltered text from the buffer, what
     `buffer-substring' would return.

     If DELETE is non-`nil', this function deletes the text between
     START and END after copying it, like `delete-and-extract-region'.

     If NOPROPS is non-`nil', the final string returned does not
     include text properties, while the string passed through the
     filters still includes text properties from the buffer text.

     Lisp code should use this function instead of `buffer-substring',
     `buffer-substring-no-properties', or `delete-and-extract-region'
     when copying into user-accessible data structures such as the
     kill-ring, X clipboard, and registers.  Major and minor modes can
     add functions to `buffer-substring-filters' to alter such text as
     it is copied out of the buffer.

 -- Variable: buffer-substring-filters
     This variable should be a list of functions that accept a single
     argument, a string, and return a string.
     `filter-buffer-substring' passes the buffer substring to the first
     function in this list, and the return value of each function is
     passed to the next function.  The return value of the last
     function is used as the return value of `filter-buffer-substring'.

     As a special convention, point is set to the start of the buffer
     text being operated on (i.e., the START argument for
     `filter-buffer-substring') before these functions are called.

     If this variable is `nil', no filtering is performed.

 -- Function: buffer-string
     This function returns the contents of the entire accessible
     portion of the current buffer as a string.  It is equivalent to

          (buffer-substring (point-min) (point-max))

          ---------- Buffer: foo ----------
          This is the contents of buffer foo

          ---------- Buffer: foo ----------

          (buffer-string)
               => "This is the contents of buffer foo\n"

 -- Function: current-word &optional strict really-word
     This function returns the symbol (or word) at or near point, as a
     string.  The return value includes no text properties.

     If the optional argument REALLY-WORD is non-`nil', it finds a
     word; otherwise, it finds a symbol (which includes both word
     characters and symbol constituent characters).

     If the optional argument STRICT is non-`nil', then point must be
     in or next to the symbol or word--if no symbol or word is there,
     the function returns `nil'.  Otherwise, a nearby symbol or word on
     the same line is acceptable.

 -- Function: thing-at-point thing
     Return the THING around or next to point, as a string.

     The argument THING is a symbol which specifies a kind of syntactic
     entity.  Possibilities include `symbol', `list', `sexp', `defun',
     `filename', `url', `word', `sentence', `whitespace', `line',
     `page', and others.

          ---------- Buffer: foo ----------
          Gentlemen may cry ``Pea-!-ce! Peace!,''
          but there is no peace.
          ---------- Buffer: foo ----------

          (thing-at-point 'word)
               => "Peace"
          (thing-at-point 'line)
               => "Gentlemen may cry ``Peace! Peace!,''\n"
          (thing-at-point 'whitespace)
               => nil


File: elisp,  Node: Comparing Text,  Next: Insertion,  Prev: Buffer Contents,  Up: Text

32.3 Comparing Text
===================

This function lets you compare portions of the text in a buffer, without
copying them into strings first.

 -- Function: compare-buffer-substrings buffer1 start1 end1 buffer2
          start2 end2
     This function lets you compare two substrings of the same buffer
     or two different buffers.  The first three arguments specify one
     substring, giving a buffer (or a buffer name) and two positions
     within the buffer.  The last three arguments specify the other
     substring in the same way.  You can use `nil' for BUFFER1,
     BUFFER2, or both to stand for the current buffer.

     The value is negative if the first substring is less, positive if
     the first is greater, and zero if they are equal.  The absolute
     value of the result is one plus the index of the first differing
     characters within the substrings.

     This function ignores case when comparing characters if
     `case-fold-search' is non-`nil'.  It always ignores text
     properties.

     Suppose the current buffer contains the text `foobarbar
     haha!rara!'; then in this example the two substrings are `rbar '
     and `rara!'.  The value is 2 because the first substring is greater
     at the second character.

          (compare-buffer-substrings nil 6 11 nil 16 21)
               => 2


File: elisp,  Node: Insertion,  Next: Commands for Insertion,  Prev: Comparing Text,  Up: Text

32.4 Inserting Text
===================

"Insertion" means adding new text to a buffer.  The inserted text goes
at point--between the character before point and the character after
point.  Some insertion functions leave point before the inserted text,
while other functions leave it after.  We call the former insertion
"after point" and the latter insertion "before point".

   Insertion relocates markers that point at positions after the
insertion point, so that they stay with the surrounding text (*note
Markers::).  When a marker points at the place of insertion, insertion
may or may not relocate the marker, depending on the marker's insertion
type (*note Marker Insertion Types::).  Certain special functions such
as `insert-before-markers' relocate all such markers to point after the
inserted text, regardless of the markers' insertion type.

   Insertion functions signal an error if the current buffer is
read-only or if they insert within read-only text.

   These functions copy text characters from strings and buffers along
with their properties.  The inserted characters have exactly the same
properties as the characters they were copied from.  By contrast,
characters specified as separate arguments, not part of a string or
buffer, inherit their text properties from the neighboring text.

   The insertion functions convert text from unibyte to multibyte in
order to insert in a multibyte buffer, and vice versa--if the text
comes from a string or from a buffer.  However, they do not convert
unibyte character codes 128 through 255 to multibyte characters, not
even if the current buffer is a multibyte buffer.  *Note Converting
Representations::.

 -- Function: insert &rest args
     This function inserts the strings and/or characters ARGS into the
     current buffer, at point, moving point forward.  In other words, it
     inserts the text before point.  An error is signaled unless all
     ARGS are either strings or characters.  The value is `nil'.

 -- Function: insert-before-markers &rest args
     This function inserts the strings and/or characters ARGS into the
     current buffer, at point, moving point forward.  An error is
     signaled unless all ARGS are either strings or characters.  The
     value is `nil'.

     This function is unlike the other insertion functions in that it
     relocates markers initially pointing at the insertion point, to
     point after the inserted text.  If an overlay begins at the
     insertion point, the inserted text falls outside the overlay; if a
     nonempty overlay ends at the insertion point, the inserted text
     falls inside that overlay.

 -- Function: insert-char character count &optional inherit
     This function inserts COUNT instances of CHARACTER into the
     current buffer before point.  The argument COUNT should be an
     integer, and CHARACTER must be a character.  The value is `nil'.

     This function does not convert unibyte character codes 128 through
     255 to multibyte characters, not even if the current buffer is a
     multibyte buffer.  *Note Converting Representations::.

     If INHERIT is non-`nil', then the inserted characters inherit
     sticky text properties from the two characters before and after the
     insertion point.  *Note Sticky Properties::.

 -- Function: insert-buffer-substring from-buffer-or-name &optional
          start end
     This function inserts a portion of buffer FROM-BUFFER-OR-NAME
     (which must already exist) into the current buffer before point.
     The text inserted is the region between START and END.  (These
     arguments default to the beginning and end of the accessible
     portion of that buffer.)  This function returns `nil'.

     In this example, the form is executed with buffer `bar' as the
     current buffer.  We assume that buffer `bar' is initially empty.

          ---------- Buffer: foo ----------
          We hold these truths to be self-evident, that all
          ---------- Buffer: foo ----------

          (insert-buffer-substring "foo" 1 20)
               => nil

          ---------- Buffer: bar ----------
          We hold these truth-!-
          ---------- Buffer: bar ----------

 -- Function: insert-buffer-substring-no-properties from-buffer-or-name
          &optional start end
     This is like `insert-buffer-substring' except that it does not
     copy any text properties.

   *Note Sticky Properties::, for other insertion functions that inherit
text properties from the nearby text in addition to inserting it.
Whitespace inserted by indentation functions also inherits text
properties.


File: elisp,  Node: Commands for Insertion,  Next: Deletion,  Prev: Insertion,  Up: Text

32.5 User-Level Insertion Commands
==================================

This section describes higher-level commands for inserting text,
commands intended primarily for the user but useful also in Lisp
programs.

 -- Command: insert-buffer from-buffer-or-name
     This command inserts the entire accessible contents of
     FROM-BUFFER-OR-NAME (which must exist) into the current buffer
     after point.  It leaves the mark after the inserted text.  The
     value is `nil'.

 -- Command: self-insert-command count
     This command inserts the last character typed; it does so COUNT
     times, before point, and returns `nil'.  Most printing characters
     are bound to this command.  In routine use, `self-insert-command'
     is the most frequently called function in Emacs, but programs
     rarely use it except to install it on a keymap.

     In an interactive call, COUNT is the numeric prefix argument.

     Self-insertion translates the input character through
     `translation-table-for-input'.  *Note Translation of Characters::.

     This command calls `auto-fill-function' whenever that is non-`nil'
     and the character inserted is in the table `auto-fill-chars'
     (*note Auto Filling::).

     This command performs abbrev expansion if Abbrev mode is enabled
     and the inserted character does not have word-constituent syntax.
     (*Note Abbrevs::, and *note Syntax Class Table::.)  It is also
     responsible for calling `blink-paren-function' when the inserted
     character has close parenthesis syntax (*note Blinking::).

     Do not try substituting your own definition of
     `self-insert-command' for the standard one.  The editor command
     loop handles this function specially.

 -- Command: newline &optional number-of-newlines
     This command inserts newlines into the current buffer before point.
     If NUMBER-OF-NEWLINES is supplied, that many newline characters
     are inserted.

     This function calls `auto-fill-function' if the current column
     number is greater than the value of `fill-column' and
     NUMBER-OF-NEWLINES is `nil'.  Typically what `auto-fill-function'
     does is insert a newline; thus, the overall result in this case is
     to insert two newlines at different places: one at point, and
     another earlier in the line.  `newline' does not auto-fill if
     NUMBER-OF-NEWLINES is non-`nil'.

     This command indents to the left margin if that is not zero.
     *Note Margins::.

     The value returned is `nil'.  In an interactive call, COUNT is the
     numeric prefix argument.

 -- Variable: overwrite-mode
     This variable controls whether overwrite mode is in effect.  The
     value should be `overwrite-mode-textual', `overwrite-mode-binary',
     or `nil'.  `overwrite-mode-textual' specifies textual overwrite
     mode (treats newlines and tabs specially), and
     `overwrite-mode-binary' specifies binary overwrite mode (treats
     newlines and tabs like any other characters).


File: elisp,  Node: Deletion,  Next: User-Level Deletion,  Prev: Commands for Insertion,  Up: Text

32.6 Deleting Text
==================

Deletion means removing part of the text in a buffer, without saving it
in the kill ring (*note The Kill Ring::).  Deleted text can't be
yanked, but can be reinserted using the undo mechanism (*note Undo::).
Some deletion functions do save text in the kill ring in some special
cases.

   All of the deletion functions operate on the current buffer.

 -- Command: erase-buffer
     This function deletes the entire text of the current buffer (_not_
     just the accessible portion), leaving it empty.  If the buffer is
     read-only, it signals a `buffer-read-only' error; if some of the
     text in it is read-only, it signals a `text-read-only' error.
     Otherwise, it deletes the text without asking for any
     confirmation.  It returns `nil'.

     Normally, deleting a large amount of text from a buffer inhibits
     further auto-saving of that buffer "because it has shrunk."
     However, `erase-buffer' does not do this, the idea being that the
     future text is not really related to the former text, and its size
     should not be compared with that of the former text.

 -- Command: delete-region start end
     This command deletes the text between positions START and END in
     the current buffer, and returns `nil'.  If point was inside the
     deleted region, its value afterward is START.  Otherwise, point
     relocates with the surrounding text, as markers do.

 -- Function: delete-and-extract-region start end
     This function deletes the text between positions START and END in
     the current buffer, and returns a string containing the text just
     deleted.

     If point was inside the deleted region, its value afterward is
     START.  Otherwise, point relocates with the surrounding text, as
     markers do.

 -- Command: delete-char count &optional killp
     This command deletes COUNT characters directly after point, or
     before point if COUNT is negative.  If KILLP is non-`nil', then it
     saves the deleted characters in the kill ring.

     In an interactive call, COUNT is the numeric prefix argument, and
     KILLP is the unprocessed prefix argument.  Therefore, if a prefix
     argument is supplied, the text is saved in the kill ring.  If no
     prefix argument is supplied, then one character is deleted, but
     not saved in the kill ring.

     The value returned is always `nil'.

 -- Command: delete-backward-char count &optional killp
     This command deletes COUNT characters directly before point, or
     after point if COUNT is negative.  If KILLP is non-`nil', then it
     saves the deleted characters in the kill ring.

     In an interactive call, COUNT is the numeric prefix argument, and
     KILLP is the unprocessed prefix argument.  Therefore, if a prefix
     argument is supplied, the text is saved in the kill ring.  If no
     prefix argument is supplied, then one character is deleted, but
     not saved in the kill ring.

     The value returned is always `nil'.

 -- Command: backward-delete-char-untabify count &optional killp
     This command deletes COUNT characters backward, changing tabs into
     spaces.  When the next character to be deleted is a tab, it is
     first replaced with the proper number of spaces to preserve
     alignment and then one of those spaces is deleted instead of the
     tab.  If KILLP is non-`nil', then the command saves the deleted
     characters in the kill ring.

     Conversion of tabs to spaces happens only if COUNT is positive.
     If it is negative, exactly -COUNT characters after point are
     deleted.

     In an interactive call, COUNT is the numeric prefix argument, and
     KILLP is the unprocessed prefix argument.  Therefore, if a prefix
     argument is supplied, the text is saved in the kill ring.  If no
     prefix argument is supplied, then one character is deleted, but
     not saved in the kill ring.

     The value returned is always `nil'.

 -- User Option: backward-delete-char-untabify-method
     This option specifies how `backward-delete-char-untabify' should
     deal with whitespace.  Possible values include `untabify', the
     default, meaning convert a tab to many spaces and delete one;
     `hungry', meaning delete all tabs and spaces before point with one
     command; `all' meaning delete all tabs, spaces and newlines before
     point, and `nil', meaning do nothing special for whitespace
     characters.


File: elisp,  Node: User-Level Deletion,  Next: The Kill Ring,  Prev: Deletion,  Up: Text

32.7 User-Level Deletion Commands
=================================

This section describes higher-level commands for deleting text,
commands intended primarily for the user but useful also in Lisp
programs.

 -- Command: delete-horizontal-space &optional backward-only
     This function deletes all spaces and tabs around point.  It returns
     `nil'.

     If BACKWARD-ONLY is non-`nil', the function deletes spaces and
     tabs before point, but not after point.

     In the following examples, we call `delete-horizontal-space' four
     times, once on each line, with point between the second and third
     characters on the line each time.

          ---------- Buffer: foo ----------
          I -!-thought
          I -!-     thought
          We-!- thought
          Yo-!-u thought
          ---------- Buffer: foo ----------

          (delete-horizontal-space)   ; Four times.
               => nil

          ---------- Buffer: foo ----------
          Ithought
          Ithought
          Wethought
          You thought
          ---------- Buffer: foo ----------

 -- Command: delete-indentation &optional join-following-p
     This function joins the line point is on to the previous line,
     deleting any whitespace at the join and in some cases replacing it
     with one space.  If JOIN-FOLLOWING-P is non-`nil',
     `delete-indentation' joins this line to the following line
     instead.  The function returns `nil'.

     If there is a fill prefix, and the second of the lines being joined
     starts with the prefix, then `delete-indentation' deletes the fill
     prefix before joining the lines.  *Note Margins::.

     In the example below, point is located on the line starting
     `events', and it makes no difference if there are trailing spaces
     in the preceding line.

          ---------- Buffer: foo ----------
          When in the course of human
          -!-    events, it becomes necessary
          ---------- Buffer: foo ----------

          (delete-indentation)
               => nil

          ---------- Buffer: foo ----------
          When in the course of human-!- events, it becomes necessary
          ---------- Buffer: foo ----------

     After the lines are joined, the function `fixup-whitespace' is
     responsible for deciding whether to leave a space at the junction.

 -- Command: fixup-whitespace
     This function replaces all the horizontal whitespace surrounding
     point with either one space or no space, according to the context.
     It returns `nil'.

     At the beginning or end of a line, the appropriate amount of space
     is none.  Before a character with close parenthesis syntax, or
     after a character with open parenthesis or expression-prefix
     syntax, no space is also appropriate.  Otherwise, one space is
     appropriate.  *Note Syntax Class Table::.

     In the example below, `fixup-whitespace' is called the first time
     with point before the word `spaces' in the first line.  For the
     second invocation, point is directly after the `('.

          ---------- Buffer: foo ----------
          This has too many     -!-spaces
          This has too many spaces at the start of (-!-   this list)
          ---------- Buffer: foo ----------

          (fixup-whitespace)
               => nil
          (fixup-whitespace)
               => nil

          ---------- Buffer: foo ----------
          This has too many spaces
          This has too many spaces at the start of (this list)
          ---------- Buffer: foo ----------

 -- Command: just-one-space &optional n
     This command replaces any spaces and tabs around point with a
     single space, or N spaces if N is specified.  It returns `nil'.

 -- Command: delete-blank-lines
     This function deletes blank lines surrounding point.  If point is
     on a blank line with one or more blank lines before or after it,
     then all but one of them are deleted.  If point is on an isolated
     blank line, then it is deleted.  If point is on a nonblank line,
     the command deletes all blank lines immediately following it.

     A blank line is defined as a line containing only tabs and spaces.

     `delete-blank-lines' returns `nil'.


File: elisp,  Node: The Kill Ring,  Next: Undo,  Prev: User-Level Deletion,  Up: Text

32.8 The Kill Ring
==================

"Kill functions" delete text like the deletion functions, but save it
so that the user can reinsert it by "yanking".  Most of these functions
have `kill-' in their name.  By contrast, the functions whose names
start with `delete-' normally do not save text for yanking (though they
can still be undone); these are "deletion" functions.

   Most of the kill commands are primarily for interactive use, and are
not described here.  What we do describe are the functions provided for
use in writing such commands.  You can use these functions to write
commands for killing text.  When you need to delete text for internal
purposes within a Lisp function, you should normally use deletion
functions, so as not to disturb the kill ring contents.  *Note
Deletion::.

   Killed text is saved for later yanking in the "kill ring".  This is
a list that holds a number of recent kills, not just the last text
kill.  We call this a "ring" because yanking treats it as having
elements in a cyclic order.  The list is kept in the variable
`kill-ring', and can be operated on with the usual functions for lists;
there are also specialized functions, described in this section, that
treat it as a ring.

   Some people think this use of the word "kill" is unfortunate, since
it refers to operations that specifically _do not_ destroy the entities
"killed."  This is in sharp contrast to ordinary life, in which death
is permanent and "killed" entities do not come back to life.
Therefore, other metaphors have been proposed.  For example, the term
"cut ring" makes sense to people who, in pre-computer days, used
scissors and paste to cut up and rearrange manuscripts.  However, it
would be difficult to change the terminology now.

* Menu:

* Kill Ring Concepts::     What text looks like in the kill ring.
* Kill Functions::         Functions that kill text.
* Yanking::                How yanking is done.
* Yank Commands::          Commands that access the kill ring.
* Low-Level Kill Ring::	   Functions and variables for kill ring access.
* Internals of Kill Ring:: Variables that hold kill ring data.


File: elisp,  Node: Kill Ring Concepts,  Next: Kill Functions,  Up: The Kill Ring

32.8.1 Kill Ring Concepts
-------------------------

The kill ring records killed text as strings in a list, most recent
first.  A short kill ring, for example, might look like this:

     ("some text" "a different piece of text" "even older text")

When the list reaches `kill-ring-max' entries in length, adding a new
entry automatically deletes the last entry.

   When kill commands are interwoven with other commands, each kill
command makes a new entry in the kill ring.  Multiple kill commands in
succession build up a single kill ring entry, which would be yanked as a
unit; the second and subsequent consecutive kill commands add text to
the entry made by the first one.

   For yanking, one entry in the kill ring is designated the "front" of
the ring.  Some yank commands "rotate" the ring by designating a
different element as the "front."  But this virtual rotation doesn't
change the list itself--the most recent entry always comes first in the
list.


File: elisp,  Node: Kill Functions,  Next: Yanking,  Prev: Kill Ring Concepts,  Up: The Kill Ring

32.8.2 Functions for Killing
----------------------------

`kill-region' is the usual subroutine for killing text.  Any command
that calls this function is a "kill command" (and should probably have
`kill' in its name).  `kill-region' puts the newly killed text in a new
element at the beginning of the kill ring or adds it to the most recent
element.  It determines automatically (using `last-command') whether
the previous command was a kill command, and if so appends the killed
text to the most recent entry.

 -- Command: kill-region start end &optional yank-handler
     This function kills the text in the region defined by START and
     END.  The text is deleted but saved in the kill ring, along with
     its text properties.  The value is always `nil'.

     In an interactive call, START and END are point and the mark.

     If the buffer or text is read-only, `kill-region' modifies the kill
     ring just the same, then signals an error without modifying the
     buffer.  This is convenient because it lets the user use a series
     of kill commands to copy text from a read-only buffer into the
     kill ring.

     If YANK-HANDLER is non-`nil', this puts that value onto the string
     of killed text, as a `yank-handler' text property.  *Note
     Yanking::.  Note that if YANK-HANDLER is `nil', any `yank-handler'
     properties present on the killed text are copied onto the kill
     ring, like other text properties.

 -- User Option: kill-read-only-ok
     If this option is non-`nil', `kill-region' does not signal an
     error if the buffer or text is read-only.  Instead, it simply
     returns, updating the kill ring but not changing the buffer.

 -- Command: copy-region-as-kill start end
     This command saves the region defined by START and END on the kill
     ring (including text properties), but does not delete the text
     from the buffer.  It returns `nil'.

     The command does not set `this-command' to `kill-region', so a
     subsequent kill command does not append to the same kill ring
     entry.

     Don't call `copy-region-as-kill' in Lisp programs unless you aim to
     support Emacs 18.  For newer Emacs versions, it is better to use
     `kill-new' or `kill-append' instead.  *Note Low-Level Kill Ring::.


File: elisp,  Node: Yanking,  Next: Yank Commands,  Prev: Kill Functions,  Up: The Kill Ring

32.8.3 Yanking
--------------

Yanking means inserting text from the kill ring, but it does not insert
the text blindly.  Yank commands and some other commands use
`insert-for-yank' to perform special processing on the text that they
copy into the buffer.

 -- Function: insert-for-yank string
     This function normally works like `insert' except that it doesn't
     insert the text properties in the `yank-excluded-properties' list.
     However, if any part of STRING has a non-`nil' `yank-handler' text
     property, that property can do various special processing on that
     part of the text being inserted.

 -- Function: insert-buffer-substring-as-yank buf &optional start end
     This function resembles `insert-buffer-substring' except that it
     doesn't insert the text properties in the
     `yank-excluded-properties' list.

   You can put a `yank-handler' text property on all or part of the
text to control how it will be inserted if it is yanked.  The
`insert-for-yank' function looks for that property.  The property value
must be a list of one to four elements, with the following format
(where elements after the first may be omitted):

     (FUNCTION PARAM NOEXCLUDE UNDO)

   Here is what the elements do:

FUNCTION
     When FUNCTION is present and non-`nil', it is called instead of
     `insert' to insert the string.  FUNCTION takes one argument--the
     string to insert.

PARAM
     If PARAM is present and non-`nil', it replaces STRING (or the part
     of STRING being processed) as the object passed to FUNCTION (or
     `insert'); for example, if FUNCTION is `yank-rectangle', PARAM
     should be a list of strings to insert as a rectangle.

NOEXCLUDE
     If NOEXCLUDE is present and non-`nil', the normal removal of the
     yank-excluded-properties is not performed; instead FUNCTION is
     responsible for removing those properties.  This may be necessary
     if FUNCTION adjusts point before or after inserting the object.

UNDO
     If UNDO is present and non-`nil', it is a function that will be
     called by `yank-pop' to undo the insertion of the current object.
     It is called with two arguments, the start and end of the current
     region.  FUNCTION can set `yank-undo-function' to override the
     UNDO value.


File: elisp,  Node: Yank Commands,  Next: Low-Level Kill Ring,  Prev: Yanking,  Up: The Kill Ring

32.8.4 Functions for Yanking
----------------------------

This section describes higher-level commands for yanking, which are
intended primarily for the user but useful also in Lisp programs.  Both
`yank' and `yank-pop' honor the `yank-excluded-properties' variable and
`yank-handler' text property (*note Yanking::).

 -- Command: yank &optional arg
     This command inserts before point the text at the front of the kill
     ring.  It sets the mark at the beginning of that text, using
     `push-mark' (*note The Mark::), and puts point at the end.

     If ARG is a non-`nil' list (which occurs interactively when the
     user types `C-u' with no digits), then `yank' inserts the text as
     described above, but puts point before the yanked text and sets
     the mark after it.

     If ARG is a number, then `yank' inserts the ARGth most recently
     killed text--the ARGth element of the kill ring list, counted
     cyclically from the front, which is considered the first element
     for this purpose.

     `yank' does not alter the contents of the kill ring, unless it
     used text provided by another program, in which case it pushes
     that text onto the kill ring.  However if ARG is an integer
     different from one, it rotates the kill ring to place the yanked
     string at the front.

     `yank' returns `nil'.

 -- Command: yank-pop &optional arg
     This command replaces the just-yanked entry from the kill ring
     with a different entry from the kill ring.

     This is allowed only immediately after a `yank' or another
     `yank-pop'.  At such a time, the region contains text that was just
     inserted by yanking.  `yank-pop' deletes that text and inserts in
     its place a different piece of killed text.  It does not add the
     deleted text to the kill ring, since it is already in the kill
     ring somewhere.  It does however rotate the kill ring to place the
     newly yanked string at the front.

     If ARG is `nil', then the replacement text is the previous element
     of the kill ring.  If ARG is numeric, the replacement is the ARGth
     previous kill.  If ARG is negative, a more recent kill is the
     replacement.

     The sequence of kills in the kill ring wraps around, so that after
     the oldest one comes the newest one, and before the newest one
     goes the oldest.

     The return value is always `nil'.

 -- Variable: yank-undo-function
     If this variable is non-`nil', the function `yank-pop' uses its
     value instead of `delete-region' to delete the text inserted by
     the previous `yank' or `yank-pop' command.  The value must be a
     function of two arguments, the start and end of the current region.

     The function `insert-for-yank' automatically sets this variable
     according to the UNDO element of the `yank-handler' text property,
     if there is one.


File: elisp,  Node: Low-Level Kill Ring,  Next: Internals of Kill Ring,  Prev: Yank Commands,  Up: The Kill Ring

32.8.5 Low-Level Kill Ring
--------------------------

These functions and variables provide access to the kill ring at a
lower level, but still convenient for use in Lisp programs, because they
take care of interaction with window system selections (*note Window
System Selections::).

 -- Function: current-kill n &optional do-not-move
     The function `current-kill' rotates the yanking pointer, which
     designates the "front" of the kill ring, by N places (from newer
     kills to older ones), and returns the text at that place in the
     ring.

     If the optional second argument DO-NOT-MOVE is non-`nil', then
     `current-kill' doesn't alter the yanking pointer; it just returns
     the Nth kill, counting from the current yanking pointer.

     If N is zero, indicating a request for the latest kill,
     `current-kill' calls the value of `interprogram-paste-function'
     (documented below) before consulting the kill ring.  If that value
     is a function and calling it returns a string or a list of several
     string, `current-kill' pushes the strings onto the kill ring and
     returns the first string.  It also sets the yanking pointer to
     point to the kill-ring entry of the first string returned by
     `interprogram-paste-function', regardless of the value of
     DO-NOT-MOVE.  Otherwise, `current-kill' does not treat a zero
     value for N specially: it returns the entry pointed at by the
     yanking pointer and does not move the yanking pointer.

 -- Function: kill-new string &optional replace yank-handler
     This function pushes the text STRING onto the kill ring and makes
     the yanking pointer point to it.  It discards the oldest entry if
     appropriate.  It also invokes the value of
     `interprogram-cut-function' (see below).

     If REPLACE is non-`nil', then `kill-new' replaces the first
     element of the kill ring with STRING, rather than pushing STRING
     onto the kill ring.

     If YANK-HANDLER is non-`nil', this puts that value onto the string
     of killed text, as a `yank-handler' property.  *Note Yanking::.
     Note that if YANK-HANDLER is `nil', then `kill-new' copies any
     `yank-handler' properties present on STRING onto the kill ring, as
     it does with other text properties.

 -- Function: kill-append string before-p &optional yank-handler
     This function appends the text STRING to the first entry in the
     kill ring and makes the yanking pointer point to the combined
     entry.  Normally STRING goes at the end of the entry, but if
     BEFORE-P is non-`nil', it goes at the beginning.  This function
     also invokes the value of `interprogram-cut-function' (see below).
     This handles YANK-HANDLER just like `kill-new', except that if
     YANK-HANDLER is different from the `yank-handler' property of the
     first entry of the kill ring, `kill-append' pushes the
     concatenated string onto the kill ring, instead of replacing the
     original first entry with it.

 -- Variable: interprogram-paste-function
     This variable provides a way of transferring killed text from other
     programs, when you are using a window system.  Its value should be
     `nil' or a function of no arguments.

     If the value is a function, `current-kill' calls it to get the
     "most recent kill."  If the function returns a non-`nil' value,
     then that value is used as the "most recent kill."  If it returns
     `nil', then the front of the kill ring is used.

     To facilitate support for window systems that support multiple
     selections, this function may also return a list of strings.  In
     that case, the first string is used as the "most recent kill", and
     all the other strings are pushed onto the kill ring, for easy
     access by `yank-pop'.

     The normal use of this function is to get the window system's
     primary selection as the most recent kill, even if the selection
     belongs to another application.  *Note Window System Selections::.
     However, if the selection was provided by the current Emacs
     session, this function should return `nil'.  (If it is hard to
     tell whether Emacs or some other program provided the selection,
     it should be good enough to use `string=' to compare it with the
     last text Emacs provided.)

 -- Variable: interprogram-cut-function
     This variable provides a way of communicating killed text to other
     programs, when you are using a window system.  Its value should be
     `nil' or a function of one required and one optional argument.

     If the value is a function, `kill-new' and `kill-append' call it
     with the new first element of the kill ring as the first argument.
     The second, optional, argument has the same meaning as the PUSH
     argument to `x-set-cut-buffer' (*note Definition of
     x-set-cut-buffer::) and only affects the second and later cut
     buffers.

     The normal use of this function is to set the window system's
     primary selection (and first cut buffer) from the newly killed
     text.  *Note Window System Selections::.


File: elisp,  Node: Internals of Kill Ring,  Prev: Low-Level Kill Ring,  Up: The Kill Ring

32.8.6 Internals of the Kill Ring
---------------------------------

The variable `kill-ring' holds the kill ring contents, in the form of a
list of strings.  The most recent kill is always at the front of the
list.

   The `kill-ring-yank-pointer' variable points to a link in the kill
ring list, whose CAR is the text to yank next.  We say it identifies
the "front" of the ring.  Moving `kill-ring-yank-pointer' to a
different link is called "rotating the kill ring".  We call the kill
ring a "ring" because the functions that move the yank pointer wrap
around from the end of the list to the beginning, or vice-versa.
Rotation of the kill ring is virtual; it does not change the value of
`kill-ring'.

   Both `kill-ring' and `kill-ring-yank-pointer' are Lisp variables
whose values are normally lists.  The word "pointer" in the name of the
`kill-ring-yank-pointer' indicates that the variable's purpose is to
identify one element of the list for use by the next yank command.

   The value of `kill-ring-yank-pointer' is always `eq' to one of the
links in the kill ring list.  The element it identifies is the CAR of
that link.  Kill commands, which change the kill ring, also set this
variable to the value of `kill-ring'.  The effect is to rotate the ring
so that the newly killed text is at the front.

   Here is a diagram that shows the variable `kill-ring-yank-pointer'
pointing to the second entry in the kill ring `("some text" "a
different piece of text" "yet older text")'.

     kill-ring                  ---- kill-ring-yank-pointer
       |                       |
       |                       v
       |     --- ---          --- ---      --- ---
        --> |   |   |------> |   |   |--> |   |   |--> nil
             --- ---          --- ---      --- ---
              |                |            |
              |                |            |
              |                |             -->"yet older text"
              |                |
              |                 --> "a different piece of text"
              |
               --> "some text"

This state of affairs might occur after `C-y' (`yank') immediately
followed by `M-y' (`yank-pop').

 -- Variable: kill-ring
     This variable holds the list of killed text sequences, most
     recently killed first.

 -- Variable: kill-ring-yank-pointer
     This variable's value indicates which element of the kill ring is
     at the "front" of the ring for yanking.  More precisely, the value
     is a tail of the value of `kill-ring', and its CAR is the kill
     string that `C-y' should yank.

 -- User Option: kill-ring-max
     The value of this variable is the maximum length to which the kill
     ring can grow, before elements are thrown away at the end.  The
     default value for `kill-ring-max' is 60.


File: elisp,  Node: Undo,  Next: Maintaining Undo,  Prev: The Kill Ring,  Up: Text

32.9 Undo
=========

Most buffers have an "undo list", which records all changes made to the
buffer's text so that they can be undone.  (The buffers that don't have
one are usually special-purpose buffers for which Emacs assumes that
undoing is not useful.  In particular, any buffer whose name begins
with a space has its undo recording off by default; see *note Buffer
Names::.)  All the primitives that modify the text in the buffer
automatically add elements to the front of the undo list, which is in
the variable `buffer-undo-list'.

 -- Variable: buffer-undo-list
     This buffer-local variable's value is the undo list of the current
     buffer. A value of `t' disables the recording of undo information.

   Here are the kinds of elements an undo list can have:

`POSITION'
     This kind of element records a previous value of point; undoing
     this element moves point to POSITION.  Ordinary cursor motion does
     not make any sort of undo record, but deletion operations use
     these entries to record where point was before the command.

`(BEG . END)'
     This kind of element indicates how to delete text that was
     inserted.  Upon insertion, the text occupied the range BEG-END in
     the buffer.

`(TEXT . POSITION)'
     This kind of element indicates how to reinsert text that was
     deleted.  The deleted text itself is the string TEXT.  The place to
     reinsert it is `(abs POSITION)'.  If POSITION is positive, point
     was at the beginning of the deleted text, otherwise it was at the
     end.

`(t HIGH . LOW)'
     This kind of element indicates that an unmodified buffer became
     modified.  The elements HIGH and LOW are two integers, each
     recording 16 bits of the visited file's modification time as of
     when it was previously visited or saved.  `primitive-undo' uses
     those values to determine whether to mark the buffer as unmodified
     once again; it does so only if the file's modification time
     matches those numbers.

`(nil PROPERTY VALUE BEG . END)'
     This kind of element records a change in a text property.  Here's
     how you might undo the change:

          (put-text-property BEG END PROPERTY VALUE)

`(MARKER . ADJUSTMENT)'
     This kind of element records the fact that the marker MARKER was
     relocated due to deletion of surrounding text, and that it moved
     ADJUSTMENT character positions.  Undoing this element moves MARKER
     - ADJUSTMENT characters.

`(apply FUNNAME . ARGS)'
     This is an extensible undo item, which is undone by calling
     FUNNAME with arguments ARGS.

`(apply DELTA BEG END FUNNAME . ARGS)'
     This is an extensible undo item, which records a change limited to
     the range BEG to END, which increased the size of the buffer by
     DELTA.  It is undone by calling FUNNAME with arguments ARGS.

     This kind of element enables undo limited to a region to determine
     whether the element pertains to that region.

`nil'
     This element is a boundary.  The elements between two boundaries
     are called a "change group"; normally, each change group
     corresponds to one keyboard command, and undo commands normally
     undo an entire group as a unit.

 -- Function: undo-boundary
     This function places a boundary element in the undo list.  The undo
     command stops at such a boundary, and successive undo commands undo
     to earlier and earlier boundaries.  This function returns `nil'.

     The editor command loop automatically creates an undo boundary
     before each key sequence is executed.  Thus, each undo normally
     undoes the effects of one command.  Self-inserting input
     characters are an exception.  The command loop makes a boundary
     for the first such character; the next 19 consecutive
     self-inserting input characters do not make boundaries, and then
     the 20th does, and so on as long as self-inserting characters
     continue.

     All buffer modifications add a boundary whenever the previous
     undoable change was made in some other buffer.  This is to ensure
     that each command makes a boundary in each buffer where it makes
     changes.

     Calling this function explicitly is useful for splitting the
     effects of a command into more than one unit.  For example,
     `query-replace' calls `undo-boundary' after each replacement, so
     that the user can undo individual replacements one by one.

 -- Variable: undo-in-progress
     This variable is normally `nil', but the undo commands bind it to
     `t'.  This is so that various kinds of change hooks can tell when
     they're being called for the sake of undoing.

 -- Function: primitive-undo count list
     This is the basic function for undoing elements of an undo list.
     It undoes the first COUNT elements of LIST, returning the rest of
     LIST.

     `primitive-undo' adds elements to the buffer's undo list when it
     changes the buffer.  Undo commands avoid confusion by saving the
     undo list value at the beginning of a sequence of undo operations.
     Then the undo operations use and update the saved value.  The new
     elements added by undoing are not part of this saved value, so
     they don't interfere with continuing to undo.

     This function does not bind `undo-in-progress'.


File: elisp,  Node: Maintaining Undo,  Next: Filling,  Prev: Undo,  Up: Text

32.10 Maintaining Undo Lists
============================

This section describes how to enable and disable undo information for a
given buffer.  It also explains how the undo list is truncated
automatically so it doesn't get too big.

   Recording of undo information in a newly created buffer is normally
enabled to start with; but if the buffer name starts with a space, the
undo recording is initially disabled.  You can explicitly enable or
disable undo recording with the following two functions, or by setting
`buffer-undo-list' yourself.

 -- Command: buffer-enable-undo &optional buffer-or-name
     This command enables recording undo information for buffer
     BUFFER-OR-NAME, so that subsequent changes can be undone.  If no
     argument is supplied, then the current buffer is used.  This
     function does nothing if undo recording is already enabled in the
     buffer.  It returns `nil'.

     In an interactive call, BUFFER-OR-NAME is the current buffer.  You
     cannot specify any other buffer.

 -- Command: buffer-disable-undo &optional buffer-or-name
     This function discards the undo list of BUFFER-OR-NAME, and
     disables further recording of undo information.  As a result, it
     is no longer possible to undo either previous changes or any
     subsequent changes.  If the undo list of BUFFER-OR-NAME is already
     disabled, this function has no effect.

     This function returns `nil'.

   As editing continues, undo lists get longer and longer.  To prevent
them from using up all available memory space, garbage collection trims
them back to size limits you can set.  (For this purpose, the "size" of
an undo list measures the cons cells that make up the list, plus the
strings of deleted text.)  Three variables control the range of
acceptable sizes: `undo-limit', `undo-strong-limit' and
`undo-outer-limit'.  In these variables, size is counted as the number
of bytes occupied, which includes both saved text and other data.

 -- User Option: undo-limit
     This is the soft limit for the acceptable size of an undo list.
     The change group at which this size is exceeded is the last one
     kept.

 -- User Option: undo-strong-limit
     This is the upper limit for the acceptable size of an undo list.
     The change group at which this size is exceeded is discarded
     itself (along with all older change groups).  There is one
     exception: the very latest change group is only discarded if it
     exceeds `undo-outer-limit'.

 -- User Option: undo-outer-limit
     If at garbage collection time the undo info for the current command
     exceeds this limit, Emacs discards the info and displays a warning.
     This is a last ditch limit to prevent memory overflow.

 -- User Option: undo-ask-before-discard
     If this variable is non-`nil', when the undo info exceeds
     `undo-outer-limit', Emacs asks in the echo area whether to discard
     the info.  The default value is `nil', which means to discard it
     automatically.

     This option is mainly intended for debugging.  Garbage collection
     is inhibited while the question is asked, which means that Emacs
     might leak memory if the user waits too long before answering the
     question.


File: elisp,  Node: Filling,  Next: Margins,  Prev: Maintaining Undo,  Up: Text

32.11 Filling
=============

"Filling" means adjusting the lengths of lines (by moving the line
breaks) so that they are nearly (but no greater than) a specified
maximum width.  Additionally, lines can be "justified", which means
inserting spaces to make the left and/or right margins line up
precisely.  The width is controlled by the variable `fill-column'.  For
ease of reading, lines should be no longer than 70 or so columns.

   You can use Auto Fill mode (*note Auto Filling::) to fill text
automatically as you insert it, but changes to existing text may leave
it improperly filled.  Then you must fill the text explicitly.

   Most of the commands in this section return values that are not
meaningful.  All the functions that do filling take note of the current
left margin, current right margin, and current justification style
(*note Margins::).  If the current justification style is `none', the
filling functions don't actually do anything.

   Several of the filling functions have an argument JUSTIFY.  If it is
non-`nil', that requests some kind of justification.  It can be `left',
`right', `full', or `center', to request a specific style of
justification.  If it is `t', that means to use the current
justification style for this part of the text (see
`current-justification', below).  Any other value is treated as `full'.

   When you call the filling functions interactively, using a prefix
argument implies the value `full' for JUSTIFY.

 -- Command: fill-paragraph &optional justify region
     This command fills the paragraph at or after point.  If JUSTIFY is
     non-`nil', each line is justified as well.  It uses the ordinary
     paragraph motion commands to find paragraph boundaries.  *Note
     Paragraphs: (emacs)Paragraphs.

     When REGION is non-`nil', then if Transient Mark mode is enabled
     and the mark is active, this command calls `fill-region' to fill
     all the paragraphs in the region, instead of filling only the
     current paragraph.  When this command is called interactively,
     REGION is `t'.

 -- Command: fill-region start end &optional justify nosqueeze to-eop
     This command fills each of the paragraphs in the region from START
     to END.  It justifies as well if JUSTIFY is non-`nil'.

     If NOSQUEEZE is non-`nil', that means to leave whitespace other
     than line breaks untouched.  If TO-EOP is non-`nil', that means to
     keep filling to the end of the paragraph--or the next hard
     newline, if `use-hard-newlines' is enabled (see below).

     The variable `paragraph-separate' controls how to distinguish
     paragraphs.  *Note Standard Regexps::.

 -- Command: fill-individual-paragraphs start end &optional justify
          citation-regexp
     This command fills each paragraph in the region according to its
     individual fill prefix.  Thus, if the lines of a paragraph were
     indented with spaces, the filled paragraph will remain indented in
     the same fashion.

     The first two arguments, START and END, are the beginning and end
     of the region to be filled.  The third and fourth arguments,
     JUSTIFY and CITATION-REGEXP, are optional.  If JUSTIFY is
     non-`nil', the paragraphs are justified as well as filled.  If
     CITATION-REGEXP is non-`nil', it means the function is operating
     on a mail message and therefore should not fill the header lines.
     If CITATION-REGEXP is a string, it is used as a regular
     expression; if it matches the beginning of a line, that line is
     treated as a citation marker.

     Ordinarily, `fill-individual-paragraphs' regards each change in
     indentation as starting a new paragraph.  If
     `fill-individual-varying-indent' is non-`nil', then only separator
     lines separate paragraphs.  That mode can handle indented
     paragraphs with additional indentation on the first line.

 -- User Option: fill-individual-varying-indent
     This variable alters the action of `fill-individual-paragraphs' as
     described above.

 -- Command: fill-region-as-paragraph start end &optional justify
          nosqueeze squeeze-after
     This command considers a region of text as a single paragraph and
     fills it.  If the region was made up of many paragraphs, the blank
     lines between paragraphs are removed.  This function justifies as
     well as filling when JUSTIFY is non-`nil'.

     If NOSQUEEZE is non-`nil', that means to leave whitespace other
     than line breaks untouched.  If SQUEEZE-AFTER is non-`nil', it
     specifies a position in the region, and means don't canonicalize
     spaces before that position.

     In Adaptive Fill mode, this command calls `fill-context-prefix' to
     choose a fill prefix by default.  *Note Adaptive Fill::.

 -- Command: justify-current-line &optional how eop nosqueeze
     This command inserts spaces between the words of the current line
     so that the line ends exactly at `fill-column'.  It returns `nil'.

     The argument HOW, if non-`nil' specifies explicitly the style of
     justification.  It can be `left', `right', `full', `center', or
     `none'.  If it is `t', that means to do follow specified
     justification style (see `current-justification', below).  `nil'
     means to do full justification.

     If EOP is non-`nil', that means do only left-justification if
     `current-justification' specifies full justification.  This is
     used for the last line of a paragraph; even if the paragraph as a
     whole is fully justified, the last line should not be.

     If NOSQUEEZE is non-`nil', that means do not change interior
     whitespace.

 -- User Option: default-justification
     This variable's value specifies the style of justification to use
     for text that doesn't specify a style with a text property.  The
     possible values are `left', `right', `full', `center', or `none'.
     The default value is `left'.

 -- Function: current-justification
     This function returns the proper justification style to use for
     filling the text around point.

     This returns the value of the `justification' text property at
     point, or the variable DEFAULT-JUSTIFICATION if there is no such
     text property.  However, it returns `nil' rather than `none' to
     mean "don't justify".

 -- User Option: sentence-end-double-space
     If this variable is non-`nil', a period followed by just one space
     does not count as the end of a sentence, and the filling functions
     avoid breaking the line at such a place.

 -- User Option: sentence-end-without-period
     If this variable is non-`nil', a sentence can end without a
     period.  This is used for languages like Thai, where sentences end
     with a double space but without a period.

 -- User Option: sentence-end-without-space
     If this variable is non-`nil', it should be a string of characters
     that can end a sentence without following spaces.

 -- Variable: fill-paragraph-function
     This variable provides a way to override the filling of paragraphs.
     If its value is non-`nil', `fill-paragraph' calls this function to
     do the work.  If the function returns a non-`nil' value,
     `fill-paragraph' assumes the job is done, and immediately returns
     that value.

     The usual use of this feature is to fill comments in programming
     language modes.  If the function needs to fill a paragraph in the
     usual way, it can do so as follows:

          (let ((fill-paragraph-function nil))
            (fill-paragraph arg))

 -- Variable: fill-forward-paragraph-function
     This variable provides a way to override how the filling functions,
     such as `fill-region' and `fill-paragraph', move forward to the
     next paragraph.  Its value should be a function, which is called
     with a single argument N, the number of paragraphs to move, and
     should return the difference between N and the number of
     paragraphs actually moved.  The default value of this variable is
     `forward-paragraph'.  *Note Paragraphs: (emacs)Paragraphs.

 -- Variable: use-hard-newlines
     If this variable is non-`nil', the filling functions do not delete
     newlines that have the `hard' text property.  These "hard
     newlines" act as paragraph separators.


File: elisp,  Node: Margins,  Next: Adaptive Fill,  Prev: Filling,  Up: Text

32.12 Margins for Filling
=========================

 -- User Option: fill-prefix
     This buffer-local variable, if non-`nil', specifies a string of
     text that appears at the beginning of normal text lines and should
     be disregarded when filling them.  Any line that fails to start
     with the fill prefix is considered the start of a paragraph; so is
     any line that starts with the fill prefix followed by additional
     whitespace.  Lines that start with the fill prefix but no
     additional whitespace are ordinary text lines that can be filled
     together.  The resulting filled lines also start with the fill
     prefix.

     The fill prefix follows the left margin whitespace, if any.

 -- User Option: fill-column
     This buffer-local variable specifies the maximum width of filled
     lines.  Its value should be an integer, which is a number of
     columns.  All the filling, justification, and centering commands
     are affected by this variable, including Auto Fill mode (*note
     Auto Filling::).

     As a practical matter, if you are writing text for other people to
     read, you should set `fill-column' to no more than 70.  Otherwise
     the line will be too long for people to read comfortably, and this
     can make the text seem clumsy.

     The default value for `fill-column' is 70.

 -- Command: set-left-margin from to margin
     This sets the `left-margin' property on the text from FROM to TO
     to the value MARGIN.  If Auto Fill mode is enabled, this command
     also refills the region to fit the new margin.

 -- Command: set-right-margin from to margin
     This sets the `right-margin' property on the text from FROM to TO
     to the value MARGIN.  If Auto Fill mode is enabled, this command
     also refills the region to fit the new margin.

 -- Function: current-left-margin
     This function returns the proper left margin value to use for
     filling the text around point.  The value is the sum of the
     `left-margin' property of the character at the start of the
     current line (or zero if none), and the value of the variable
     `left-margin'.

 -- Function: current-fill-column
     This function returns the proper fill column value to use for
     filling the text around point.  The value is the value of the
     `fill-column' variable, minus the value of the `right-margin'
     property of the character after point.

 -- Command: move-to-left-margin &optional n force
     This function moves point to the left margin of the current line.
     The column moved to is determined by calling the function
     `current-left-margin'.  If the argument N is non-`nil',
     `move-to-left-margin' moves forward N-1 lines first.

     If FORCE is non-`nil', that says to fix the line's indentation if
     that doesn't match the left margin value.

 -- Function: delete-to-left-margin &optional from to
     This function removes left margin indentation from the text between
     FROM and TO.  The amount of indentation to delete is determined by
     calling `current-left-margin'.  In no case does this function
     delete non-whitespace.  If FROM and TO are omitted, they default
     to the whole buffer.

 -- Function: indent-to-left-margin
     This function adjusts the indentation at the beginning of the
     current line to the value specified by the variable `left-margin'.
     (That may involve either inserting or deleting whitespace.)  This
     function is value of `indent-line-function' in Paragraph-Indent
     Text mode.

 -- User Option: left-margin
     This variable specifies the base left margin column.  In
     Fundamental mode, `C-j' indents to this column.  This variable
     automatically becomes buffer-local when set in any fashion.

 -- User Option: fill-nobreak-predicate
     This variable gives major modes a way to specify not to break a
     line at certain places.  Its value should be a list of functions.
     Whenever filling considers breaking the line at a certain place in
     the buffer, it calls each of these functions with no arguments and
     with point located at that place.  If any of the functions returns
     non-`nil', then the line won't be broken there.


File: elisp,  Node: Adaptive Fill,  Next: Auto Filling,  Prev: Margins,  Up: Text

32.13 Adaptive Fill Mode
========================

When "Adaptive Fill Mode" is enabled, Emacs determines the fill prefix
automatically from the text in each paragraph being filled rather than
using a predetermined value.  During filling, this fill prefix gets
inserted at the start of the second and subsequent lines of the
paragraph as described in *note Filling::, and in *note Auto Filling::.

 -- User Option: adaptive-fill-mode
     Adaptive Fill mode is enabled when this variable is non-`nil'.  It
     is `t' by default.

 -- Function: fill-context-prefix from to
     This function implements the heart of Adaptive Fill mode; it
     chooses a fill prefix based on the text between FROM and TO,
     typically the start and end of a paragraph.  It does this by
     looking at the first two lines of the paragraph, based on the
     variables described below.

     Usually, this function returns the fill prefix, a string.  However,
     before doing this, the function makes a final check (not specially
     mentioned in the following) that a line starting with this prefix
     wouldn't look like the start of a paragraph.  Should this happen,
     the function signals the anomaly by returning `nil' instead.

     In detail, `fill-context-prefix' does this:

       1. It takes a candidate for the fill prefix from the first
          line--it tries first the function in `adaptive-fill-function'
          (if any), then the regular expression `adaptive-fill-regexp'
          (see below).  The first non-`nil' result of these, or the
          empty string if they're both `nil', becomes the first line's
          candidate.

       2. If the paragraph has as yet only one line, the function tests
          the validity of the prefix candidate just found.  The
          function then returns the candidate if it's valid, or a
          string of spaces otherwise.  (see the description of
          `adaptive-fill-first-line-regexp' below).

       3. When the paragraph already has two lines, the function next
          looks for a prefix candidate on the second line, in just the
          same way it did for the first line.  If it doesn't find one,
          it returns `nil'.

       4. The function now compares the two candidate prefixes
          heuristically: if the non-whitespace characters in the line 2
          candidate occur in the same order in the line 1 candidate,
          the function returns the line 2 candidate.  Otherwise, it
          returns the largest initial substring which is common to both
          candidates (which might be the empty string).

 -- User Option: adaptive-fill-regexp
     Adaptive Fill mode matches this regular expression against the text
     starting after the left margin whitespace (if any) on a line; the
     characters it matches are that line's candidate for the fill
     prefix.

     The default value matches whitespace with certain punctuation
     characters intermingled.

 -- User Option: adaptive-fill-first-line-regexp
     Used only in one-line paragraphs, this regular expression acts as
     an additional check of the validity of the one available candidate
     fill prefix: the candidate must match this regular expression, or
     match `comment-start-skip'.  If it doesn't, `fill-context-prefix'
     replaces the candidate with a string of spaces "of the same width"
     as it.

     The default value of this variable is `"\\`[ \t]*\\'"', which
     matches only a string of whitespace.  The effect of this default
     is to force the fill prefixes found in one-line paragraphs always
     to be pure whitespace.

 -- User Option: adaptive-fill-function
     You can specify more complex ways of choosing a fill prefix
     automatically by setting this variable to a function.  The
     function is called with point after the left margin (if any) of a
     line, and it must preserve point.  It should return either "that
     line's" fill prefix or `nil', meaning it has failed to determine a
     prefix.


File: elisp,  Node: Auto Filling,  Next: Sorting,  Prev: Adaptive Fill,  Up: Text

32.14 Auto Filling
==================

Auto Fill mode is a minor mode that fills lines automatically as text
is inserted.  This section describes the hook used by Auto Fill mode.
For a description of functions that you can call explicitly to fill and
justify existing text, see *note Filling::.

   Auto Fill mode also enables the functions that change the margins and
justification style to refill portions of the text.  *Note Margins::.

 -- Variable: auto-fill-function
     The value of this buffer-local variable should be a function (of no
     arguments) to be called after self-inserting a character from the
     table `auto-fill-chars'.  It may be `nil', in which case nothing
     special is done in that case.

     The value of `auto-fill-function' is `do-auto-fill' when Auto-Fill
     mode is enabled.  That is a function whose sole purpose is to
     implement the usual strategy for breaking a line.

          In older Emacs versions, this variable was named
          `auto-fill-hook', but since it is not called with the
          standard convention for hooks, it was renamed to
          `auto-fill-function' in version 19.

 -- Variable: normal-auto-fill-function
     This variable specifies the function to use for
     `auto-fill-function', if and when Auto Fill is turned on.  Major
     modes can set buffer-local values for this variable to alter how
     Auto Fill works.

 -- Variable: auto-fill-chars
     A char table of characters which invoke `auto-fill-function' when
     self-inserted--space and newline in most language environments.
     They have an entry `t' in the table.


File: elisp,  Node: Sorting,  Next: Columns,  Prev: Auto Filling,  Up: Text

32.15 Sorting Text
==================

The sorting functions described in this section all rearrange text in a
buffer.  This is in contrast to the function `sort', which rearranges
the order of the elements of a list (*note Rearrangement::).  The
values returned by these functions are not meaningful.

 -- Function: sort-subr reverse nextrecfun endrecfun &optional
          startkeyfun endkeyfun predicate
     This function is the general text-sorting routine that subdivides a
     buffer into records and then sorts them.  Most of the commands in
     this section use this function.

     To understand how `sort-subr' works, consider the whole accessible
     portion of the buffer as being divided into disjoint pieces called
     "sort records".  The records may or may not be contiguous, but they
     must not overlap.  A portion of each sort record (perhaps all of
     it) is designated as the sort key.  Sorting rearranges the records
     in order by their sort keys.

     Usually, the records are rearranged in order of ascending sort key.
     If the first argument to the `sort-subr' function, REVERSE, is
     non-`nil', the sort records are rearranged in order of descending
     sort key.

     The next four arguments to `sort-subr' are functions that are
     called to move point across a sort record.  They are called many
     times from within `sort-subr'.

       1. NEXTRECFUN is called with point at the end of a record.  This
          function moves point to the start of the next record.  The
          first record is assumed to start at the position of point
          when `sort-subr' is called.  Therefore, you should usually
          move point to the beginning of the buffer before calling
          `sort-subr'.

          This function can indicate there are no more sort records by
          leaving point at the end of the buffer.

       2. ENDRECFUN is called with point within a record.  It moves
          point to the end of the record.

       3. STARTKEYFUN is called to move point from the start of a
          record to the start of the sort key.  This argument is
          optional; if it is omitted, the whole record is the sort key.
          If supplied, the function should either return a non-`nil'
          value to be used as the sort key, or return `nil' to indicate
          that the sort key is in the buffer starting at point.  In the
          latter case, ENDKEYFUN is called to find the end of the sort
          key.

       4. ENDKEYFUN is called to move point from the start of the sort
          key to the end of the sort key.  This argument is optional.
          If STARTKEYFUN returns `nil' and this argument is omitted (or
          `nil'), then the sort key extends to the end of the record.
          There is no need for ENDKEYFUN if STARTKEYFUN returns a
          non-`nil' value.

     The argument PREDICATE is the function to use to compare keys.  If
     keys are numbers, it defaults to `<'; otherwise it defaults to
     `string<'.

     As an example of `sort-subr', here is the complete function
     definition for `sort-lines':

          ;; Note that the first two lines of doc string
          ;; are effectively one line when viewed by a user.
          (defun sort-lines (reverse beg end)
            "Sort lines in region alphabetically;\
           argument means descending order.
          Called from a program, there are three arguments:
          REVERSE (non-nil means reverse order),\
           BEG and END (region to sort).
          The variable `sort-fold-case' determines\
           whether alphabetic case affects
          the sort order."
            (interactive "P\nr")
            (save-excursion
              (save-restriction
                (narrow-to-region beg end)
                (goto-char (point-min))
                (let ((inhibit-field-text-motion t))
                  (sort-subr reverse 'forward-line 'end-of-line)))))

     Here `forward-line' moves point to the start of the next record,
     and `end-of-line' moves point to the end of record.  We do not pass
     the arguments STARTKEYFUN and ENDKEYFUN, because the entire record
     is used as the sort key.

     The `sort-paragraphs' function is very much the same, except that
     its `sort-subr' call looks like this:

          (sort-subr reverse
                     (function
                       (lambda ()
                         (while (and (not (eobp))
                                (looking-at paragraph-separate))
                           (forward-line 1))))
                     'forward-paragraph)

     Markers pointing into any sort records are left with no useful
     position after `sort-subr' returns.

 -- User Option: sort-fold-case
     If this variable is non-`nil', `sort-subr' and the other buffer
     sorting functions ignore case when comparing strings.

 -- Command: sort-regexp-fields reverse record-regexp key-regexp start
          end
     This command sorts the region between START and END alphabetically
     as specified by RECORD-REGEXP and KEY-REGEXP.  If REVERSE is a
     negative integer, then sorting is in reverse order.

     Alphabetical sorting means that two sort keys are compared by
     comparing the first characters of each, the second characters of
     each, and so on.  If a mismatch is found, it means that the sort
     keys are unequal; the sort key whose character is less at the
     point of first mismatch is the lesser sort key.  The individual
     characters are compared according to their numerical character
     codes in the Emacs character set.

     The value of the RECORD-REGEXP argument specifies how to divide
     the buffer into sort records.  At the end of each record, a search
     is done for this regular expression, and the text that matches it
     is taken as the next record.  For example, the regular expression
     `^.+$', which matches lines with at least one character besides a
     newline, would make each such line into a sort record.  *Note
     Regular Expressions::, for a description of the syntax and meaning
     of regular expressions.

     The value of the KEY-REGEXP argument specifies what part of each
     record is the sort key.  The KEY-REGEXP could match the whole
     record, or only a part.  In the latter case, the rest of the
     record has no effect on the sorted order of records, but it is
     carried along when the record moves to its new position.

     The KEY-REGEXP argument can refer to the text matched by a
     subexpression of RECORD-REGEXP, or it can be a regular expression
     on its own.

     If KEY-REGEXP is:

    `\DIGIT'
          then the text matched by the DIGITth `\(...\)' parenthesis
          grouping in RECORD-REGEXP is the sort key.

    `\&'
          then the whole record is the sort key.

    a regular expression
          then `sort-regexp-fields' searches for a match for the regular
          expression within the record.  If such a match is found, it
          is the sort key.  If there is no match for KEY-REGEXP within
          a record then that record is ignored, which means its
          position in the buffer is not changed.  (The other records
          may move around it.)

     For example, if you plan to sort all the lines in the region by the
     first word on each line starting with the letter `f', you should
     set RECORD-REGEXP to `^.*$' and set KEY-REGEXP to `\<f\w*\>'.  The
     resulting expression looks like this:

          (sort-regexp-fields nil "^.*$" "\\<f\\w*\\>"
                              (region-beginning)
                              (region-end))

     If you call `sort-regexp-fields' interactively, it prompts for
     RECORD-REGEXP and KEY-REGEXP in the minibuffer.

 -- Command: sort-lines reverse start end
     This command alphabetically sorts lines in the region between
     START and END.  If REVERSE is non-`nil', the sort is in reverse
     order.

 -- Command: sort-paragraphs reverse start end
     This command alphabetically sorts paragraphs in the region between
     START and END.  If REVERSE is non-`nil', the sort is in reverse
     order.

 -- Command: sort-pages reverse start end
     This command alphabetically sorts pages in the region between
     START and END.  If REVERSE is non-`nil', the sort is in reverse
     order.

 -- Command: sort-fields field start end
     This command sorts lines in the region between START and END,
     comparing them alphabetically by the FIELDth field of each line.
     Fields are separated by whitespace and numbered starting from 1.
     If FIELD is negative, sorting is by the -FIELDth field from the
     end of the line.  This command is useful for sorting tables.

 -- Command: sort-numeric-fields field start end
     This command sorts lines in the region between START and END,
     comparing them numerically by the FIELDth field of each line.
     Fields are separated by whitespace and numbered starting from 1.
     The specified field must contain a number in each line of the
     region.  Numbers starting with 0 are treated as octal, and numbers
     starting with `0x' are treated as hexadecimal.

     If FIELD is negative, sorting is by the -FIELDth field from the
     end of the line.  This command is useful for sorting tables.

 -- User Option: sort-numeric-base
     This variable specifies the default radix for
     `sort-numeric-fields' to parse numbers.

 -- Command: sort-columns reverse &optional beg end
     This command sorts the lines in the region between BEG and END,
     comparing them alphabetically by a certain range of columns.  The
     column positions of BEG and END bound the range of columns to sort
     on.

     If REVERSE is non-`nil', the sort is in reverse order.

     One unusual thing about this command is that the entire line
     containing position BEG, and the entire line containing position
     END, are included in the region sorted.

     Note that `sort-columns' rejects text that contains tabs, because
     tabs could be split across the specified columns.  Use `M-x
     untabify' to convert tabs to spaces before sorting.

     When possible, this command actually works by calling the `sort'
     utility program.


File: elisp,  Node: Columns,  Next: Indentation,  Prev: Sorting,  Up: Text

32.16 Counting Columns
======================

The column functions convert between a character position (counting
characters from the beginning of the buffer) and a column position
(counting screen characters from the beginning of a line).

   These functions count each character according to the number of
columns it occupies on the screen.  This means control characters count
as occupying 2 or 4 columns, depending upon the value of `ctl-arrow',
and tabs count as occupying a number of columns that depends on the
value of `tab-width' and on the column where the tab begins.  *Note
Usual Display::.

   Column number computations ignore the width of the window and the
amount of horizontal scrolling.  Consequently, a column value can be
arbitrarily high.  The first (or leftmost) column is numbered 0.  They
also ignore overlays and text properties, aside from invisibility.

 -- Function: current-column
     This function returns the horizontal position of point, measured in
     columns, counting from 0 at the left margin.  The column position
     is the sum of the widths of all the displayed representations of
     the characters between the start of the current line and point.

     For an example of using `current-column', see the description of
     `count-lines' in *note Text Lines::.

 -- Command: move-to-column column &optional force
     This function moves point to COLUMN in the current line.  The
     calculation of COLUMN takes into account the widths of the
     displayed representations of the characters between the start of
     the line and point.

     When called interactively, COLUMN is the value of prefix numeric
     argument.  If COLUMN is not an integer, an error is signaled.

     If column COLUMN is beyond the end of the line, point moves to the
     end of the line.  If COLUMN is negative, point moves to the
     beginning of the line.

     If it is impossible to move to column COLUMN because that is in
     the middle of a multicolumn character such as a tab, point moves
     to the end of that character.  However, if FORCE is non-`nil', and
     COLUMN is in the middle of a tab, then `move-to-column' converts
     the tab into spaces so that it can move precisely to column
     COLUMN.  Other multicolumn characters can cause anomalies despite
     FORCE, since there is no way to split them.

     The argument FORCE also has an effect if the line isn't long
     enough to reach column COLUMN; if it is `t', that means to add
     whitespace at the end of the line to reach that column.

     The return value is the column number actually moved to.


File: elisp,  Node: Indentation,  Next: Case Changes,  Prev: Columns,  Up: Text

32.17 Indentation
=================

The indentation functions are used to examine, move to, and change
whitespace that is at the beginning of a line.  Some of the functions
can also change whitespace elsewhere on a line.  Columns and indentation
count from zero at the left margin.

* Menu:

* Primitive Indent::      Functions used to count and insert indentation.
* Mode-Specific Indent::  Customize indentation for different modes.
* Region Indent::         Indent all the lines in a region.
* Relative Indent::       Indent the current line based on previous lines.
* Indent Tabs::           Adjustable, typewriter-like tab stops.
* Motion by Indent::      Move to first non-blank character.


File: elisp,  Node: Primitive Indent,  Next: Mode-Specific Indent,  Up: Indentation

32.17.1 Indentation Primitives
------------------------------

This section describes the primitive functions used to count and insert
indentation.  The functions in the following sections use these
primitives.  *Note Width::, for related functions.

 -- Function: current-indentation
     This function returns the indentation of the current line, which is
     the horizontal position of the first nonblank character.  If the
     contents are entirely blank, then this is the horizontal position
     of the end of the line.

 -- Command: indent-to column &optional minimum
     This function indents from point with tabs and spaces until COLUMN
     is reached.  If MINIMUM is specified and non-`nil', then at least
     that many spaces are inserted even if this requires going beyond
     COLUMN.  Otherwise the function does nothing if point is already
     beyond COLUMN.  The value is the column at which the inserted
     indentation ends.

     The inserted whitespace characters inherit text properties from the
     surrounding text (usually, from the preceding text only).  *Note
     Sticky Properties::.

 -- User Option: indent-tabs-mode
     If this variable is non-`nil', indentation functions can insert
     tabs as well as spaces.  Otherwise, they insert only spaces.
     Setting this variable automatically makes it buffer-local in the
     current buffer.


File: elisp,  Node: Mode-Specific Indent,  Next: Region Indent,  Prev: Primitive Indent,  Up: Indentation

32.17.2 Indentation Controlled by Major Mode
--------------------------------------------

An important function of each major mode is to customize the <TAB> key
to indent properly for the language being edited.  This section
describes the mechanism of the <TAB> key and how to control it.  The
functions in this section return unpredictable values.

 -- Variable: indent-line-function
     This variable's value is the function to be used by <TAB> (and
     various commands) to indent the current line.  The command
     `indent-according-to-mode' does no more than call this function.

     In Lisp mode, the value is the symbol `lisp-indent-line'; in C
     mode, `c-indent-line'; in Fortran mode, `fortran-indent-line'.
     The default value is `indent-relative'.

 -- Command: indent-according-to-mode
     This command calls the function in `indent-line-function' to
     indent the current line in a way appropriate for the current major
     mode.

 -- Command: indent-for-tab-command &optional rigid
     This command calls the function in `indent-line-function' to
     indent the current line; however, if that function is
     `indent-to-left-margin', `insert-tab' is called instead.  (That is
     a trivial command that inserts a tab character.)  If RIGID is
     non-`nil', this function also rigidly indents the entire balanced
     expression that starts at the beginning of the current line, to
     reflect change in indentation of the current line.

 -- Command: newline-and-indent
     This function inserts a newline, then indents the new line (the one
     following the newline just inserted) according to the major mode.

     It does indentation by calling the current `indent-line-function'.
     In programming language modes, this is the same thing <TAB> does,
     but in some text modes, where <TAB> inserts a tab,
     `newline-and-indent' indents to the column specified by
     `left-margin'.

 -- Command: reindent-then-newline-and-indent
     This command reindents the current line, inserts a newline at
     point, and then indents the new line (the one following the
     newline just inserted).

     This command does indentation on both lines according to the
     current major mode, by calling the current value of
     `indent-line-function'.  In programming language modes, this is
     the same thing <TAB> does, but in some text modes, where <TAB>
     inserts a tab, `reindent-then-newline-and-indent' indents to the
     column specified by `left-margin'.


File: elisp,  Node: Region Indent,  Next: Relative Indent,  Prev: Mode-Specific Indent,  Up: Indentation

32.17.3 Indenting an Entire Region
----------------------------------

This section describes commands that indent all the lines in the
region.  They return unpredictable values.

 -- Command: indent-region start end &optional to-column
     This command indents each nonblank line starting between START
     (inclusive) and END (exclusive).  If TO-COLUMN is `nil',
     `indent-region' indents each nonblank line by calling the current
     mode's indentation function, the value of `indent-line-function'.

     If TO-COLUMN is non-`nil', it should be an integer specifying the
     number of columns of indentation; then this function gives each
     line exactly that much indentation, by either adding or deleting
     whitespace.

     If there is a fill prefix, `indent-region' indents each line by
     making it start with the fill prefix.

 -- Variable: indent-region-function
     The value of this variable is a function that can be used by
     `indent-region' as a short cut.  It should take two arguments, the
     start and end of the region.  You should design the function so
     that it will produce the same results as indenting the lines of the
     region one by one, but presumably faster.

     If the value is `nil', there is no short cut, and `indent-region'
     actually works line by line.

     A short-cut function is useful in modes such as C mode and Lisp
     mode, where the `indent-line-function' must scan from the
     beginning of the function definition: applying it to each line
     would be quadratic in time.  The short cut can update the scan
     information as it moves through the lines indenting them; this
     takes linear time.  In a mode where indenting a line individually
     is fast, there is no need for a short cut.

     `indent-region' with a non-`nil' argument TO-COLUMN has a
     different meaning and does not use this variable.

 -- Command: indent-rigidly start end count
     This command indents all lines starting between START (inclusive)
     and END (exclusive) sideways by COUNT columns.  This "preserves
     the shape" of the affected region, moving it as a rigid unit.
     Consequently, this command is useful not only for indenting
     regions of unindented text, but also for indenting regions of
     formatted code.

     For example, if COUNT is 3, this command adds 3 columns of
     indentation to each of the lines beginning in the region specified.

     In Mail mode, `C-c C-y' (`mail-yank-original') uses
     `indent-rigidly' to indent the text copied from the message being
     replied to.

 -- Command: indent-code-rigidly start end columns &optional
          nochange-regexp
     This is like `indent-rigidly', except that it doesn't alter lines
     that start within strings or comments.

     In addition, it doesn't alter a line if NOCHANGE-REGEXP matches at
     the beginning of the line (if NOCHANGE-REGEXP is non-`nil').


File: elisp,  Node: Relative Indent,  Next: Indent Tabs,  Prev: Region Indent,  Up: Indentation

32.17.4 Indentation Relative to Previous Lines
----------------------------------------------

This section describes two commands that indent the current line based
on the contents of previous lines.

 -- Command: indent-relative &optional unindented-ok
     This command inserts whitespace at point, extending to the same
     column as the next "indent point" of the previous nonblank line.
     An indent point is a non-whitespace character following
     whitespace.  The next indent point is the first one at a column
     greater than the current column of point.  For example, if point
     is underneath and to the left of the first non-blank character of
     a line of text, it moves to that column by inserting whitespace.

     If the previous nonblank line has no next indent point (i.e., none
     at a great enough column position), `indent-relative' either does
     nothing (if UNINDENTED-OK is non-`nil') or calls
     `tab-to-tab-stop'.  Thus, if point is underneath and to the right
     of the last column of a short line of text, this command ordinarily
     moves point to the next tab stop by inserting whitespace.

     The return value of `indent-relative' is unpredictable.

     In the following example, point is at the beginning of the second
     line:

                      This line is indented twelve spaces.
          -!-The quick brown fox jumped.

     Evaluation of the expression `(indent-relative nil)' produces the
     following:

                      This line is indented twelve spaces.
                      -!-The quick brown fox jumped.

     In this next example, point is between the `m' and `p' of `jumped':

                      This line is indented twelve spaces.
          The quick brown fox jum-!-ped.

     Evaluation of the expression `(indent-relative nil)' produces the
     following:

                      This line is indented twelve spaces.
          The quick brown fox jum  -!-ped.

 -- Command: indent-relative-maybe
     This command indents the current line like the previous nonblank
     line, by calling `indent-relative' with `t' as the UNINDENTED-OK
     argument.  The return value is unpredictable.

     If the previous nonblank line has no indent points beyond the
     current column, this command does nothing.


File: elisp,  Node: Indent Tabs,  Next: Motion by Indent,  Prev: Relative Indent,  Up: Indentation

32.17.5 Adjustable "Tab Stops"
------------------------------

This section explains the mechanism for user-specified "tab stops" and
the mechanisms that use and set them.  The name "tab stops" is used
because the feature is similar to that of the tab stops on a
typewriter.  The feature works by inserting an appropriate number of
spaces and tab characters to reach the next tab stop column; it does not
affect the display of tab characters in the buffer (*note Usual
Display::).  Note that the <TAB> character as input uses this tab stop
feature only in a few major modes, such as Text mode.  *Note Tab Stops:
(emacs)Tab Stops.

 -- Command: tab-to-tab-stop
     This command inserts spaces or tabs before point, up to the next
     tab stop column defined by `tab-stop-list'.  It searches the list
     for an element greater than the current column number, and uses
     that element as the column to indent to.  It does nothing if no
     such element is found.

 -- User Option: tab-stop-list
     This variable is the list of tab stop columns used by
     `tab-to-tab-stops'.  The elements should be integers in increasing
     order.  The tab stop columns need not be evenly spaced.

     Use `M-x edit-tab-stops' to edit the location of tab stops
     interactively.


File: elisp,  Node: Motion by Indent,  Prev: Indent Tabs,  Up: Indentation

32.17.6 Indentation-Based Motion Commands
-----------------------------------------

These commands, primarily for interactive use, act based on the
indentation in the text.

 -- Command: back-to-indentation
     This command moves point to the first non-whitespace character in
     the current line (which is the line in which point is located).
     It returns `nil'.

 -- Command: backward-to-indentation &optional arg
     This command moves point backward ARG lines and then to the first
     nonblank character on that line.  It returns `nil'.  If ARG is
     omitted or `nil', it defaults to 1.

 -- Command: forward-to-indentation &optional arg
     This command moves point forward ARG lines and then to the first
     nonblank character on that line.  It returns `nil'.  If ARG is
     omitted or `nil', it defaults to 1.


File: elisp,  Node: Case Changes,  Next: Text Properties,  Prev: Indentation,  Up: Text

32.18 Case Changes
==================

The case change commands described here work on text in the current
buffer.  *Note Case Conversion::, for case conversion functions that
work on strings and characters.  *Note Case Tables::, for how to
customize which characters are upper or lower case and how to convert
them.

 -- Command: capitalize-region start end
     This function capitalizes all words in the region defined by START
     and END.  To capitalize means to convert each word's first
     character to upper case and convert the rest of each word to lower
     case.  The function returns `nil'.

     If one end of the region is in the middle of a word, the part of
     the word within the region is treated as an entire word.

     When `capitalize-region' is called interactively, START and END
     are point and the mark, with the smallest first.

          ---------- Buffer: foo ----------
          This is the contents of the 5th foo.
          ---------- Buffer: foo ----------

          (capitalize-region 1 44)
          => nil

          ---------- Buffer: foo ----------
          This Is The Contents Of The 5th Foo.
          ---------- Buffer: foo ----------

 -- Command: downcase-region start end
     This function converts all of the letters in the region defined by
     START and END to lower case.  The function returns `nil'.

     When `downcase-region' is called interactively, START and END are
     point and the mark, with the smallest first.

 -- Command: upcase-region start end
     This function converts all of the letters in the region defined by
     START and END to upper case.  The function returns `nil'.

     When `upcase-region' is called interactively, START and END are
     point and the mark, with the smallest first.

 -- Command: capitalize-word count
     This function capitalizes COUNT words after point, moving point
     over as it does.  To capitalize means to convert each word's first
     character to upper case and convert the rest of each word to lower
     case.  If COUNT is negative, the function capitalizes the -COUNT
     previous words but does not move point.  The value is `nil'.

     If point is in the middle of a word, the part of the word before
     point is ignored when moving forward.  The rest is treated as an
     entire word.

     When `capitalize-word' is called interactively, COUNT is set to
     the numeric prefix argument.

 -- Command: downcase-word count
     This function converts the COUNT words after point to all lower
     case, moving point over as it does.  If COUNT is negative, it
     converts the -COUNT previous words but does not move point.  The
     value is `nil'.

     When `downcase-word' is called interactively, COUNT is set to the
     numeric prefix argument.

 -- Command: upcase-word count
     This function converts the COUNT words after point to all upper
     case, moving point over as it does.  If COUNT is negative, it
     converts the -COUNT previous words but does not move point.  The
     value is `nil'.

     When `upcase-word' is called interactively, COUNT is set to the
     numeric prefix argument.


File: elisp,  Node: Text Properties,  Next: Substitution,  Prev: Case Changes,  Up: Text

32.19 Text Properties
=====================

Each character position in a buffer or a string can have a "text
property list", much like the property list of a symbol (*note Property
Lists::).  The properties belong to a particular character at a
particular place, such as, the letter `T' at the beginning of this
sentence or the first `o' in `foo'--if the same character occurs in two
different places, the two occurrences in general have different
properties.

   Each property has a name and a value.  Both of these can be any Lisp
object, but the name is normally a symbol.  Typically each property
name symbol is used for a particular purpose; for instance, the text
property `face' specifies the faces for displaying the character (*note
Special Properties::).  The usual way to access the property list is to
specify a name and ask what value corresponds to it.

   If a character has a `category' property, we call it the "property
category" of the character.  It should be a symbol.  The properties of
the symbol serve as defaults for the properties of the character.

   Copying text between strings and buffers preserves the properties
along with the characters; this includes such diverse functions as
`substring', `insert', and `buffer-substring'.

* Menu:

* Examining Properties::   Looking at the properties of one character.
* Changing Properties::	   Setting the properties of a range of text.
* Property Search::	   Searching for where a property changes value.
* Special Properties::	   Particular properties with special meanings.
* Format Properties::      Properties for representing formatting of text.
* Sticky Properties::      How inserted text gets properties from
                             neighboring text.
* Lazy Properties::        Computing text properties in a lazy fashion
                             only when text is examined.
* Clickable Text::         Using text properties to make regions of text
                             do something when you click on them.
* Fields::                 The `field' property defines
                             fields within the buffer.
* Not Intervals::	   Why text properties do not use
			     Lisp-visible text intervals.


File: elisp,  Node: Examining Properties,  Next: Changing Properties,  Up: Text Properties

32.19.1 Examining Text Properties
---------------------------------

The simplest way to examine text properties is to ask for the value of
a particular property of a particular character.  For that, use
`get-text-property'.  Use `text-properties-at' to get the entire
property list of a character.  *Note Property Search::, for functions
to examine the properties of a number of characters at once.

   These functions handle both strings and buffers.  Keep in mind that
positions in a string start from 0, whereas positions in a buffer start
from 1.

 -- Function: get-text-property pos prop &optional object
     This function returns the value of the PROP property of the
     character after position POS in OBJECT (a buffer or string).  The
     argument OBJECT is optional and defaults to the current buffer.

     If there is no PROP property strictly speaking, but the character
     has a property category that is a symbol, then `get-text-property'
     returns the PROP property of that symbol.

 -- Function: get-char-property position prop &optional object
     This function is like `get-text-property', except that it checks
     overlays first and then text properties.  *Note Overlays::.

     The argument OBJECT may be a string, a buffer, or a window.  If it
     is a window, then the buffer displayed in that window is used for
     text properties and overlays, but only the overlays active for that
     window are considered.  If OBJECT is a buffer, then overlays in
     that buffer are considered first, in order of decreasing priority,
     followed by the text properties.  If OBJECT is a string, only text
     properties are considered, since strings never have overlays.

 -- Function: get-char-property-and-overlay position prop &optional
          object
     This is like `get-char-property', but gives extra information
     about the overlay that the property value comes from.

     Its value is a cons cell whose CAR is the property value, the same
     value `get-char-property' would return with the same arguments.
     Its CDR is the overlay in which the property was found, or `nil',
     if it was found as a text property or not found at all.

     If POSITION is at the end of OBJECT, both the CAR and the CDR of
     the value are `nil'.

 -- Variable: char-property-alias-alist
     This variable holds an alist which maps property names to a list of
     alternative property names.  If a character does not specify a
     direct value for a property, the alternative property names are
     consulted in order; the first non-`nil' value is used.  This
     variable takes precedence over `default-text-properties', and
     `category' properties take precedence over this variable.

 -- Function: text-properties-at position &optional object
     This function returns the entire property list of the character at
     POSITION in the string or buffer OBJECT.  If OBJECT is `nil', it
     defaults to the current buffer.

 -- Variable: default-text-properties
     This variable holds a property list giving default values for text
     properties.  Whenever a character does not specify a value for a
     property, neither directly, through a category symbol, or through
     `char-property-alias-alist', the value stored in this list is used
     instead.  Here is an example:

          (setq default-text-properties '(foo 69)
                char-property-alias-alist nil)
          ;; Make sure character 1 has no properties of its own.
          (set-text-properties 1 2 nil)
          ;; What we get, when we ask, is the default value.
          (get-text-property 1 'foo)
               => 69


File: elisp,  Node: Changing Properties,  Next: Property Search,  Prev: Examining Properties,  Up: Text Properties

32.19.2 Changing Text Properties
--------------------------------

The primitives for changing properties apply to a specified range of
text in a buffer or string.  The function `set-text-properties' (see
end of section) sets the entire property list of the text in that
range; more often, it is useful to add, change, or delete just certain
properties specified by name.

   Since text properties are considered part of the contents of the
buffer (or string), and can affect how a buffer looks on the screen,
any change in buffer text properties marks the buffer as modified.
Buffer text property changes are undoable also (*note Undo::).
Positions in a string start from 0, whereas positions in a buffer start
from 1.

 -- Function: put-text-property start end prop value &optional object
     This function sets the PROP property to VALUE for the text between
     START and END in the string or buffer OBJECT.  If OBJECT is `nil',
     it defaults to the current buffer.

 -- Function: add-text-properties start end props &optional object
     This function adds or overrides text properties for the text
     between START and END in the string or buffer OBJECT.  If OBJECT
     is `nil', it defaults to the current buffer.

     The argument PROPS specifies which properties to add.  It should
     have the form of a property list (*note Property Lists::): a list
     whose elements include the property names followed alternately by
     the corresponding values.

     The return value is `t' if the function actually changed some
     property's value; `nil' otherwise (if PROPS is `nil' or its values
     agree with those in the text).

     For example, here is how to set the `comment' and `face'
     properties of a range of text:

          (add-text-properties START END
                               '(comment t face highlight))

 -- Function: remove-text-properties start end props &optional object
     This function deletes specified text properties from the text
     between START and END in the string or buffer OBJECT.  If OBJECT
     is `nil', it defaults to the current buffer.

     The argument PROPS specifies which properties to delete.  It
     should have the form of a property list (*note Property Lists::):
     a list whose elements are property names alternating with
     corresponding values.  But only the names matter--the values that
     accompany them are ignored.  For example, here's how to remove the
     `face' property.

          (remove-text-properties START END '(face nil))

     The return value is `t' if the function actually changed some
     property's value; `nil' otherwise (if PROPS is `nil' or if no
     character in the specified text had any of those properties).

     To remove all text properties from certain text, use
     `set-text-properties' and specify `nil' for the new property list.

 -- Function: remove-list-of-text-properties start end
          list-of-properties &optional object
     Like `remove-text-properties' except that LIST-OF-PROPERTIES is a
     list of property names only, not an alternating list of property
     names and values.

 -- Function: set-text-properties start end props &optional object
     This function completely replaces the text property list for the
     text between START and END in the string or buffer OBJECT.  If
     OBJECT is `nil', it defaults to the current buffer.

     The argument PROPS is the new property list.  It should be a list
     whose elements are property names alternating with corresponding
     values.

     After `set-text-properties' returns, all the characters in the
     specified range have identical properties.

     If PROPS is `nil', the effect is to get rid of all properties from
     the specified range of text.  Here's an example:

          (set-text-properties START END nil)

     Do not rely on the return value of this function.

   The easiest way to make a string with text properties is with
`propertize':

 -- Function: propertize string &rest properties
     This function returns a copy of STRING which has the text
     properties PROPERTIES.  These properties apply to all the
     characters in the string that is returned.  Here is an example that
     constructs a string with a `face' property and a `mouse-face'
     property:

          (propertize "foo" 'face 'italic
                      'mouse-face 'bold-italic)
               => #("foo" 0 3 (mouse-face bold-italic face italic))

     To put different properties on various parts of a string, you can
     construct each part with `propertize' and then combine them with
     `concat':

          (concat
           (propertize "foo" 'face 'italic
                       'mouse-face 'bold-italic)
           " and "
           (propertize "bar" 'face 'italic
                       'mouse-face 'bold-italic))
               => #("foo and bar"
                           0 3 (face italic mouse-face bold-italic)
                           3 8 nil
                           8 11 (face italic mouse-face bold-italic))

   See also the function `buffer-substring-no-properties' (*note Buffer
Contents::) which copies text from the buffer but does not copy its
properties.


File: elisp,  Node: Property Search,  Next: Special Properties,  Prev: Changing Properties,  Up: Text Properties

32.19.3 Text Property Search Functions
--------------------------------------

In typical use of text properties, most of the time several or many
consecutive characters have the same value for a property.  Rather than
writing your programs to examine characters one by one, it is much
faster to process chunks of text that have the same property value.

   Here are functions you can use to do this.  They use `eq' for
comparing property values.  In all cases, OBJECT defaults to the
current buffer.

   For high performance, it's very important to use the LIMIT argument
to these functions, especially the ones that search for a single
property--otherwise, they may spend a long time scanning to the end of
the buffer, if the property you are interested in does not change.

   These functions do not move point; instead, they return a position
(or `nil').  Remember that a position is always between two characters;
the position returned by these functions is between two characters with
different properties.

 -- Function: next-property-change pos &optional object limit
     The function scans the text forward from position POS in the
     string or buffer OBJECT till it finds a change in some text
     property, then returns the position of the change.  In other
     words, it returns the position of the first character beyond POS
     whose properties are not identical to those of the character just
     after POS.

     If LIMIT is non-`nil', then the scan ends at position LIMIT.  If
     there is no property change before that point,
     `next-property-change' returns LIMIT.

     The value is `nil' if the properties remain unchanged all the way
     to the end of OBJECT and LIMIT is `nil'.  If the value is
     non-`nil', it is a position greater than or equal to POS.  The
     value equals POS only when LIMIT equals POS.

     Here is an example of how to scan the buffer by chunks of text
     within which all properties are constant:

          (while (not (eobp))
            (let ((plist (text-properties-at (point)))
                  (next-change
                   (or (next-property-change (point) (current-buffer))
                       (point-max))))
              Process text from point to NEXT-CHANGE...
              (goto-char next-change)))

 -- Function: previous-property-change pos &optional object limit
     This is like `next-property-change', but scans back from POS
     instead of forward.  If the value is non-`nil', it is a position
     less than or equal to POS; it equals POS only if LIMIT equals POS.

 -- Function: next-single-property-change pos prop &optional object
          limit
     The function scans text for a change in the PROP property, then
     returns the position of the change.  The scan goes forward from
     position POS in the string or buffer OBJECT.  In other words, this
     function returns the position of the first character beyond POS
     whose PROP property differs from that of the character just after
     POS.

     If LIMIT is non-`nil', then the scan ends at position LIMIT.  If
     there is no property change before that point,
     `next-single-property-change' returns LIMIT.

     The value is `nil' if the property remains unchanged all the way to
     the end of OBJECT and LIMIT is `nil'.  If the value is non-`nil',
     it is a position greater than or equal to POS; it equals POS only
     if LIMIT equals POS.

 -- Function: previous-single-property-change pos prop &optional object
          limit
     This is like `next-single-property-change', but scans back from
     POS instead of forward.  If the value is non-`nil', it is a
     position less than or equal to POS; it equals POS only if LIMIT
     equals POS.

 -- Function: next-char-property-change pos &optional limit
     This is like `next-property-change' except that it considers
     overlay properties as well as text properties, and if no change is
     found before the end of the buffer, it returns the maximum buffer
     position rather than `nil' (in this sense, it resembles the
     corresponding overlay function `next-overlay-change', rather than
     `next-property-change').  There is no OBJECT operand because this
     function operates only on the current buffer.  It returns the next
     address at which either kind of property changes.

 -- Function: previous-char-property-change pos &optional limit
     This is like `next-char-property-change', but scans back from POS
     instead of forward, and returns the minimum buffer position if no
     change is found.

 -- Function: next-single-char-property-change pos prop &optional
          object limit
     This is like `next-single-property-change' except that it
     considers overlay properties as well as text properties, and if no
     change is found before the end of the OBJECT, it returns the
     maximum valid position in OBJECT rather than `nil'.  Unlike
     `next-char-property-change', this function _does_ have an OBJECT
     operand; if OBJECT is not a buffer, only text-properties are
     considered.

 -- Function: previous-single-char-property-change pos prop &optional
          object limit
     This is like `next-single-char-property-change', but scans back
     from POS instead of forward, and returns the minimum valid
     position in OBJECT if no change is found.

 -- Function: text-property-any start end prop value &optional object
     This function returns non-`nil' if at least one character between
     START and END has a property PROP whose value is VALUE.  More
     precisely, it returns the position of the first such character.
     Otherwise, it returns `nil'.

     The optional fifth argument, OBJECT, specifies the string or
     buffer to scan.  Positions are relative to OBJECT.  The default
     for OBJECT is the current buffer.

 -- Function: text-property-not-all start end prop value &optional
          object
     This function returns non-`nil' if at least one character between
     START and END does not have a property PROP with value VALUE.
     More precisely, it returns the position of the first such
     character.  Otherwise, it returns `nil'.

     The optional fifth argument, OBJECT, specifies the string or
     buffer to scan.  Positions are relative to OBJECT.  The default
     for OBJECT is the current buffer.


File: elisp,  Node: Special Properties,  Next: Format Properties,  Prev: Property Search,  Up: Text Properties

32.19.4 Properties with Special Meanings
----------------------------------------

Here is a table of text property names that have special built-in
meanings.  The following sections list a few additional special property
names that control filling and property inheritance.  All other names
have no standard meaning, and you can use them as you like.

   Note: the properties `composition', `display', `invisible' and
`intangible' can also cause point to move to an acceptable place, after
each Emacs command.  *Note Adjusting Point::.

`category'
     If a character has a `category' property, we call it the "property
     category" of the character.  It should be a symbol.  The
     properties of this symbol serve as defaults for the properties of
     the character.

`face'
     You can use the property `face' to control the font and color of
     text.  *Note Faces::, for more information.

     In the simplest case, the value is a face name.  It can also be a
     list; then each element can be any of these possibilities;

        * A face name (a symbol or string).

        * A property list of face attributes.  This has the form
          (KEYWORD VALUE ...), where each KEYWORD is a face attribute
          name and VALUE is a meaningful value for that attribute.
          With this feature, you do not need to create a face each time
          you want to specify a particular attribute for certain text.
          *Note Face Attributes::.

        * A cons cell with the form `(foreground-color . COLOR-NAME)'
          or `(background-color . COLOR-NAME)'.  These are old,
          deprecated equivalents for `(:foreground COLOR-NAME)' and
          `(:background COLOR-NAME)'.  Please convert code that uses
          them.

     It works to use the latter two forms directly as the value of the
     `face' property.

     Font Lock mode (*note Font Lock Mode::) works in most buffers by
     dynamically updating the `face' property of characters based on
     the context.

`font-lock-face'
     The `font-lock-face' property is equivalent to the `face' property
     when Font Lock mode is enabled.  When Font Lock mode is disabled,
     `font-lock-face' has no effect.

     The `font-lock-mode' property is useful for special modes that
     implement their own highlighting.  *Note Precalculated
     Fontification::.

`mouse-face'
     The property `mouse-face' is used instead of `face' when the mouse
     is on or near the character.  For this purpose, "near" means that
     all text between the character and where the mouse is have the same
     `mouse-face' property value.

`fontified'
     This property says whether the text is ready for display.  If
     `nil', Emacs's redisplay routine calls the functions in
     `fontification-functions' (*note Auto Faces::) to prepare this
     part of the buffer before it is displayed.  It is used internally
     by the "just in time" font locking code.

`display'
     This property activates various features that change the way text
     is displayed.  For example, it can make text appear taller or
     shorter, higher or lower, wider or narrow, or replaced with an
     image.  *Note Display Property::.

`help-echo'
     If text has a string as its `help-echo' property, then when you
     move the mouse onto that text, Emacs displays that string in the
     echo area, or in the tooltip window (*note Tooltips:
     (emacs)Tooltips.).

     If the value of the `help-echo' property is a function, that
     function is called with three arguments, WINDOW, OBJECT and POS
     and should return a help string or `nil' for none.  The first
     argument, WINDOW is the window in which the help was found.  The
     second, OBJECT, is the buffer, overlay or string which had the
     `help-echo' property.  The POS argument is as follows:

        * If OBJECT is a buffer, POS is the position in the buffer.

        * If OBJECT is an overlay, that overlay has a `help-echo'
          property, and POS is the position in the overlay's buffer.

        * If OBJECT is a string (an overlay string or a string displayed
          with the `display' property), POS is the position in that
          string.

     If the value of the `help-echo' property is neither a function nor
     a string, it is evaluated to obtain a help string.

     You can alter the way help text is displayed by setting the
     variable `show-help-function' (*note Help display::).

     This feature is used in the mode line and for other active text.

`keymap'
     The `keymap' property specifies an additional keymap for commands.
     When this keymap applies, it is used for key lookup before the
     minor mode keymaps and before the buffer's local map.  *Note
     Active Keymaps::.  If the property value is a symbol, the symbol's
     function definition is used as the keymap.

     The property's value for the character before point applies if it
     is non-`nil' and rear-sticky, and the property's value for the
     character after point applies if it is non-`nil' and front-sticky.
     (For mouse clicks, the position of the click is used instead of
     the position of point.)

`local-map'
     This property works like `keymap' except that it specifies a
     keymap to use _instead of_ the buffer's local map.  For most
     purposes (perhaps all purposes), it is better to use the `keymap'
     property.

`syntax-table'
     The `syntax-table' property overrides what the syntax table says
     about this particular character.  *Note Syntax Properties::.

`read-only'
     If a character has the property `read-only', then modifying that
     character is not allowed.  Any command that would do so gets an
     error, `text-read-only'.  If the property value is a string, that
     string is used as the error message.

     Insertion next to a read-only character is an error if inserting
     ordinary text there would inherit the `read-only' property due to
     stickiness.  Thus, you can control permission to insert next to
     read-only text by controlling the stickiness.  *Note Sticky
     Properties::.

     Since changing properties counts as modifying the buffer, it is not
     possible to remove a `read-only' property unless you know the
     special trick: bind `inhibit-read-only' to a non-`nil' value and
     then remove the property.  *Note Read Only Buffers::.

`invisible'
     A non-`nil' `invisible' property can make a character invisible on
     the screen.  *Note Invisible Text::, for details.

`intangible'
     If a group of consecutive characters have equal and non-`nil'
     `intangible' properties, then you cannot place point between them.
     If you try to move point forward into the group, point actually
     moves to the end of the group.  If you try to move point backward
     into the group, point actually moves to the start of the group.

     If consecutive characters have unequal non-`nil' `intangible'
     properties, they belong to separate groups; each group is
     separately treated as described above.

     When the variable `inhibit-point-motion-hooks' is non-`nil', the
     `intangible' property is ignored.

`field'
     Consecutive characters with the same `field' property constitute a
     "field".  Some motion functions including `forward-word' and
     `beginning-of-line' stop moving at a field boundary.  *Note
     Fields::.

`cursor'
     Normally, the cursor is displayed at the end of any overlay and
     text property strings present at the current buffer position.  You
     can place the cursor on any desired character of these strings by
     giving that character a non-`nil' `cursor' text property.  In
     addition, if the value of the `cursor' property of an overlay
     string is an integer number, it specifies the number of buffer's
     character positions associated with the overlay string; this way,
     Emacs will display the cursor on the character with that property
     regardless of whether the current buffer position is actually
     covered by the overlay.  Specifically, if the value of the `cursor'
     property of a character is the number N, the cursor will be
     displayed on this character for any buffer position in the range
     `[OVPOS..OVPOS+N]', where OVPOS is the starting buffer position
     covered by the overlay (*note Managing Overlays::).

`pointer'
     This specifies a specific pointer shape when the mouse pointer is
     over this text or image.  *Note Pointer Shape::, for possible
     pointer shapes.

`line-spacing'
     A newline can have a `line-spacing' text or overlay property that
     controls the height of the display line ending with that newline.
     The property value overrides the default frame line spacing and
     the buffer local `line-spacing' variable.  *Note Line Height::.

`line-height'
     A newline can have a `line-height' text or overlay property that
     controls the total height of the display line ending in that
     newline.  *Note Line Height::.

`wrap-prefix'
     If text has a `wrap-prefix' property, the prefix it defines will
     be added at display-time to the beginning of every continuation
     line due to text wrapping (so if lines are truncated, the
     wrap-prefix is never used).  It may be a string, an image, or a
     stretch-glyph such as used by the `display' text-property.  *Note
     Display Property::.

     A wrap-prefix may also be specified for an entire buffer using the
     `wrap-prefix' buffer-local variable (however, a `wrap-prefix'
     text-property takes precedence over the value of the `wrap-prefix'
     variable).  *Note Truncation::.

`line-prefix'
     If text has a `line-prefix' property, the prefix it defines will
     be added at display-time to the beginning of every non-continuation
     line.  It may be a string, an image, or a stretch-glyph such as
     used by the `display' text-property.  *Note Display Property::.

     A line-prefix may also be specified for an entire buffer using the
     `line-prefix' buffer-local variable (however, a `line-prefix'
     text-property takes precedence over the value of the `line-prefix'
     variable).  *Note Truncation::.

`modification-hooks'
     If a character has the property `modification-hooks', then its
     value should be a list of functions; modifying that character
     calls all of those functions.  Each function receives two
     arguments: the beginning and end of the part of the buffer being
     modified.  Note that if a particular modification hook function
     appears on several characters being modified by a single
     primitive, you can't predict how many times the function will be
     called.

     If these functions modify the buffer, they should bind
     `inhibit-modification-hooks' to `t' around doing so, to avoid
     confusing the internal mechanism that calls these hooks.

     Overlays also support the `modification-hooks' property, but the
     details are somewhat different (*note Overlay Properties::).

`insert-in-front-hooks'
`insert-behind-hooks'
     The operation of inserting text in a buffer also calls the
     functions listed in the `insert-in-front-hooks' property of the
     following character and in the `insert-behind-hooks' property of
     the preceding character.  These functions receive two arguments,
     the beginning and end of the inserted text.  The functions are
     called _after_ the actual insertion takes place.

     See also *note Change Hooks::, for other hooks that are called
     when you change text in a buffer.

`point-entered'
`point-left'
     The special properties `point-entered' and `point-left' record
     hook functions that report motion of point.  Each time point
     moves, Emacs compares these two property values:

        * the `point-left' property of the character after the old
          location, and

        * the `point-entered' property of the character after the new
          location.

     If these two values differ, each of them is called (if not `nil')
     with two arguments: the old value of point, and the new one.

     The same comparison is made for the characters before the old and
     new locations.  The result may be to execute two `point-left'
     functions (which may be the same function) and/or two
     `point-entered' functions (which may be the same function).  In
     any case, all the `point-left' functions are called first,
     followed by all the `point-entered' functions.

     It is possible with `char-after' to examine characters at various
     buffer positions without moving point to those positions.  Only an
     actual change in the value of point runs these hook functions.

      -- Variable: inhibit-point-motion-hooks
          When this variable is non-`nil', `point-left' and
          `point-entered' hooks are not run, and the `intangible'
          property has no effect.  Do not set this variable globally;
          bind it with `let'.

      -- Variable: show-help-function
          If this variable is non-`nil', it specifies a function called
          to display help strings.  These may be `help-echo'
          properties, menu help strings (*note Simple Menu Items::,
          *note Extended Menu Items::), or tool bar help strings (*note
          Tool Bar::).  The specified function is called with one
          argument, the help string to display.  Tooltip mode (*note
          Tooltips: (emacs)Tooltips.) provides an example.

`composition'
     This text property is used to display a sequence of characters as a
     single glyph composed from components.  But the value of the
     property itself is completely internal to Emacs and should not be
     manipulated directly by, for instance, `put-text-property'.



File: elisp,  Node: Format Properties,  Next: Sticky Properties,  Prev: Special Properties,  Up: Text Properties

32.19.5 Formatted Text Properties
---------------------------------

These text properties affect the behavior of the fill commands.  They
are used for representing formatted text.  *Note Filling::, and *note
Margins::.

`hard'
     If a newline character has this property, it is a "hard" newline.
     The fill commands do not alter hard newlines and do not move words
     across them.  However, this property takes effect only if the
     `use-hard-newlines' minor mode is enabled.  *Note Hard and Soft
     Newlines: (emacs)Hard and Soft Newlines.

`right-margin'
     This property specifies an extra right margin for filling this
     part of the text.

`left-margin'
     This property specifies an extra left margin for filling this part
     of the text.

`justification'
     This property specifies the style of justification for filling
     this part of the text.


File: elisp,  Node: Sticky Properties,  Next: Lazy Properties,  Prev: Format Properties,  Up: Text Properties

32.19.6 Stickiness of Text Properties
-------------------------------------

Self-inserting characters normally take on the same properties as the
preceding character.  This is called "inheritance" of properties.

   In a Lisp program, you can do insertion with inheritance or without,
depending on your choice of insertion primitive.  The ordinary text
insertion functions such as `insert' do not inherit any properties.
They insert text with precisely the properties of the string being
inserted, and no others.  This is correct for programs that copy text
from one context to another--for example, into or out of the kill ring.
To insert with inheritance, use the special primitives described in this
section.  Self-inserting characters inherit properties because they work
using these primitives.

   When you do insertion with inheritance, _which_ properties are
inherited, and from where, depends on which properties are "sticky".
Insertion after a character inherits those of its properties that are
"rear-sticky".  Insertion before a character inherits those of its
properties that are "front-sticky".  When both sides offer different
sticky values for the same property, the previous character's value
takes precedence.

   By default, a text property is rear-sticky but not front-sticky;
thus, the default is to inherit all the properties of the preceding
character, and nothing from the following character.

   You can control the stickiness of various text properties with two
specific text properties, `front-sticky' and `rear-nonsticky', and with
the variable `text-property-default-nonsticky'.  You can use the
variable to specify a different default for a given property.  You can
use those two text properties to make any specific properties sticky or
nonsticky in any particular part of the text.

   If a character's `front-sticky' property is `t', then all its
properties are front-sticky.  If the `front-sticky' property is a list,
then the sticky properties of the character are those whose names are
in the list.  For example, if a character has a `front-sticky' property
whose value is `(face read-only)', then insertion before the character
can inherit its `face' property and its `read-only' property, but no
others.

   The `rear-nonsticky' property works the opposite way.  Most
properties are rear-sticky by default, so the `rear-nonsticky' property
says which properties are _not_ rear-sticky.  If a character's
`rear-nonsticky' property is `t', then none of its properties are
rear-sticky.  If the `rear-nonsticky' property is a list, properties
are rear-sticky _unless_ their names are in the list.

 -- Variable: text-property-default-nonsticky
     This variable holds an alist which defines the default
     rear-stickiness of various text properties.  Each element has the
     form `(PROPERTY . NONSTICKINESS)', and it defines the stickiness
     of a particular text property, PROPERTY.

     If NONSTICKINESS is non-`nil', this means that the property
     PROPERTY is rear-nonsticky by default.  Since all properties are
     front-nonsticky by default, this makes PROPERTY nonsticky in both
     directions by default.

     The text properties `front-sticky' and `rear-nonsticky', when
     used, take precedence over the default NONSTICKINESS specified in
     `text-property-default-nonsticky'.

   Here are the functions that insert text with inheritance of
properties:

 -- Function: insert-and-inherit &rest strings
     Insert the strings STRINGS, just like the function `insert', but
     inherit any sticky properties from the adjoining text.

 -- Function: insert-before-markers-and-inherit &rest strings
     Insert the strings STRINGS, just like the function
     `insert-before-markers', but inherit any sticky properties from the
     adjoining text.

   *Note Insertion::, for the ordinary insertion functions which do not
inherit.


File: elisp,  Node: Lazy Properties,  Next: Clickable Text,  Prev: Sticky Properties,  Up: Text Properties

32.19.7 Lazy Computation of Text Properties
-------------------------------------------

Instead of computing text properties for all the text in the buffer,
you can arrange to compute the text properties for parts of the text
when and if something depends on them.

   The primitive that extracts text from the buffer along with its
properties is `buffer-substring'.  Before examining the properties,
this function runs the abnormal hook `buffer-access-fontify-functions'.

 -- Variable: buffer-access-fontify-functions
     This variable holds a list of functions for computing text
     properties.  Before `buffer-substring' copies the text and text
     properties for a portion of the buffer, it calls all the functions
     in this list.  Each of the functions receives two arguments that
     specify the range of the buffer being accessed.  (The buffer
     itself is always the current buffer.)

   The function `buffer-substring-no-properties' does not call these
functions, since it ignores text properties anyway.

   In order to prevent the hook functions from being called more than
once for the same part of the buffer, you can use the variable
`buffer-access-fontified-property'.

 -- Variable: buffer-access-fontified-property
     If this variable's value is non-`nil', it is a symbol which is used
     as a text property name.  A non-`nil' value for that text property
     means, "the other text properties for this character have already
     been computed."

     If all the characters in the range specified for `buffer-substring'
     have a non-`nil' value for this property, `buffer-substring' does
     not call the `buffer-access-fontify-functions' functions.  It
     assumes these characters already have the right text properties,
     and just copies the properties they already have.

     The normal way to use this feature is that the
     `buffer-access-fontify-functions' functions add this property, as
     well as others, to the characters they operate on.  That way, they
     avoid being called over and over for the same text.


File: elisp,  Node: Clickable Text,  Next: Fields,  Prev: Lazy Properties,  Up: Text Properties

32.19.8 Defining Clickable Text
-------------------------------

"Clickable text" is text that can be clicked, with either the mouse or
via a keyboard command, to produce some result.  Many major modes use
clickable text to implement textual hyper-links, or "links" for short.

   The easiest way to insert and manipulate links is to use the
`button' package.  *Note Buttons::.  In this section, we will explain
how to manually set up clickable text in a buffer, using text
properties.  For simplicity, we will refer to the clickable text as a
"link".

   Implementing a link involves three separate steps: (1) indicating
clickability when the mouse moves over the link; (2) making `RET' or
`Mouse-2' on that link do something; and (3) setting up a `follow-link'
condition so that the link obeys `mouse-1-click-follows-link'.

   To indicate clickability, add the `mouse-face' text property to the
text of the link; then Emacs will highlight the link when the mouse
moves over it.  In addition, you should define a tooltip or echo area
message, using the `help-echo' text property.  *Note Special
Properties::.  For instance, here is how Dired indicates that file
names are clickable:

      (if (dired-move-to-filename)
          (add-text-properties
            (point)
            (save-excursion
              (dired-move-to-end-of-filename)
              (point))
            '(mouse-face highlight
              help-echo "mouse-2: visit this file in other window")))

   To make the link clickable, bind <RET> and `Mouse-2' to commands
that perform the desired action.  Each command should check to see
whether it was called on a link, and act accordingly.  For instance,
Dired's major mode keymap binds `Mouse-2' to the following command:

     (defun dired-mouse-find-file-other-window (event)
       "In Dired, visit the file or directory name you click on."
       (interactive "e")
       (let ((window (posn-window (event-end event)))
             (pos (posn-point (event-end event)))
             file)
         (if (not (windowp window))
             (error "No file chosen"))
         (with-current-buffer (window-buffer window)
           (goto-char pos)
           (setq file (dired-get-file-for-visit)))
         (if (file-directory-p file)
             (or (and (cdr dired-subdir-alist)
                      (dired-goto-subdir file))
                 (progn
                   (select-window window)
                   (dired-other-window file)))
           (select-window window)
           (find-file-other-window (file-name-sans-versions file t)))))

This command uses the functions `posn-window' and `posn-point' to
determine where the click occurred, and `dired-get-file-for-visit' to
determine which file to visit.

   Instead of binding the mouse command in a major mode keymap, you can
bind it within the link text, using the `keymap' text property (*note
Special Properties::).  For instance:

     (let ((map (make-sparse-keymap)))
       (define-key map [mouse-2] 'operate-this-button)
       (put-text-property link-start link-end 'keymap map))

With this method, you can easily define different commands for
different links.  Furthermore, the global definition of <RET> and
`Mouse-2' remain available for the rest of the text in the buffer.

   The basic Emacs command for clicking on links is `Mouse-2'.
However, for compatibility with other graphical applications, Emacs
also recognizes `Mouse-1' clicks on links, provided the user clicks on
the link quickly without moving the mouse.  This behavior is controlled
by the user option `mouse-1-click-follows-link'.  *Note Mouse
References: (emacs)Mouse References.

   To set up the link so that it obeys `mouse-1-click-follows-link',
you must either (1) apply a `follow-link' text or overlay property to
the link text, or (2) bind the `follow-link' event to a keymap (which
can be a major mode keymap or a local keymap specified via the `keymap'
text property).  The value of the `follow-link' property, or the
binding for the `follow-link' event, acts as a "condition" for the link
action.  This condition tells Emacs two things: the circumstances under
which a `Mouse-1' click should be regarded as occurring "inside" the
link, and how to compute an "action code" that says what to translate
the `Mouse-1' click into.  The link action condition can be one of the
following:

`mouse-face'
     If the condition is the symbol `mouse-face', a position is inside
     a link if there is a non-`nil' `mouse-face' property at that
     position.  The action code is always `t'.

     For example, here is how Info mode handles <Mouse-1>:

          (define-key Info-mode-map [follow-link] 'mouse-face)

a function
     If the condition is a function, FUNC, then a position POS is
     inside a link if `(FUNC POS)' evaluates to non-`nil'.  The value
     returned by FUNC serves as the action code.

     For example, here is how pcvs enables `Mouse-1' to follow links on
     file names only:

          (define-key map [follow-link]
            (lambda (pos)
              (eq (get-char-property pos 'face) 'cvs-filename-face)))

anything else
     If the condition value is anything else, then the position is
     inside a link and the condition itself is the action code.
     Clearly, you should specify this kind of condition only when
     applying the condition via a text or property overlay on the link
     text (so that it does not apply to the entire buffer).

The action code tells `Mouse-1' how to follow the link:

a string or vector
     If the action code is a string or vector, the `Mouse-1' event is
     translated into the first element of the string or vector; i.e.,
     the action of the `Mouse-1' click is the local or global binding of
     that character or symbol.  Thus, if the action code is `"foo"',
     `Mouse-1' translates into `f'.  If it is `[foo]', `Mouse-1'
     translates into <foo>.

anything else
     For any other non-`nil' action code, the `Mouse-1' event is
     translated into a `Mouse-2' event at the same position.

   To define `Mouse-1' to activate a button defined with
`define-button-type', give the button a `follow-link' property.  The
property value should be a link action condition, as described above.
*Note Buttons::.  For example, here is how Help mode handles `Mouse-1':

     (define-button-type 'help-xref
       'follow-link t
       'action #'help-button-action)

   To define `Mouse-1' on a widget defined with `define-widget', give
the widget a `:follow-link' property.  The property value should be a
link action condition, as described above.  For example, here is how
the `link' widget specifies that a <Mouse-1> click shall be translated
to <RET>:

     (define-widget 'link 'item
       "An embedded link."
       :button-prefix 'widget-link-prefix
       :button-suffix 'widget-link-suffix
       :follow-link "\C-m"
       :help-echo "Follow the link."
       :format "%[%t%]")

 -- Function: mouse-on-link-p pos
     This function returns non-`nil' if position POS in the current
     buffer is on a link.  POS can also be a mouse event location, as
     returned by `event-start' (*note Accessing Mouse::).


File: elisp,  Node: Fields,  Next: Not Intervals,  Prev: Clickable Text,  Up: Text Properties

32.19.9 Defining and Using Fields
---------------------------------

A field is a range of consecutive characters in the buffer that are
identified by having the same value (comparing with `eq') of the
`field' property (either a text-property or an overlay property).  This
section describes special functions that are available for operating on
fields.

   You specify a field with a buffer position, POS.  We think of each
field as containing a range of buffer positions, so the position you
specify stands for the field containing that position.

   When the characters before and after POS are part of the same field,
there is no doubt which field contains POS: the one those characters
both belong to.  When POS is at a boundary between fields, which field
it belongs to depends on the stickiness of the `field' properties of
the two surrounding characters (*note Sticky Properties::).  The field
whose property would be inherited by text inserted at POS is the field
that contains POS.

   There is an anomalous case where newly inserted text at POS would
not inherit the `field' property from either side.  This happens if the
previous character's `field' property is not rear-sticky, and the
following character's `field' property is not front-sticky.  In this
case, POS belongs to neither the preceding field nor the following
field; the field functions treat it as belonging to an empty field
whose beginning and end are both at POS.

   In all of these functions, if POS is omitted or `nil', the value of
point is used by default.  If narrowing is in effect, then POS should
fall within the accessible portion.  *Note Narrowing::.

 -- Function: field-beginning &optional pos escape-from-edge limit
     This function returns the beginning of the field specified by POS.

     If POS is at the beginning of its field, and ESCAPE-FROM-EDGE is
     non-`nil', then the return value is always the beginning of the
     preceding field that _ends_ at POS, regardless of the stickiness
     of the `field' properties around POS.

     If LIMIT is non-`nil', it is a buffer position; if the beginning
     of the field is before LIMIT, then LIMIT will be returned instead.

 -- Function: field-end &optional pos escape-from-edge limit
     This function returns the end of the field specified by POS.

     If POS is at the end of its field, and ESCAPE-FROM-EDGE is
     non-`nil', then the return value is always the end of the following
     field that _begins_ at POS, regardless of the stickiness of the
     `field' properties around POS.

     If LIMIT is non-`nil', it is a buffer position; if the end of the
     field is after LIMIT, then LIMIT will be returned instead.

 -- Function: field-string &optional pos
     This function returns the contents of the field specified by POS,
     as a string.

 -- Function: field-string-no-properties &optional pos
     This function returns the contents of the field specified by POS,
     as a string, discarding text properties.

 -- Function: delete-field &optional pos
     This function deletes the text of the field specified by POS.

 -- Function: constrain-to-field new-pos old-pos &optional
          escape-from-edge only-in-line inhibit-capture-property
     This function "constrains" NEW-POS to the field that OLD-POS
     belongs to--in other words, it returns the position closest to
     NEW-POS that is in the same field as OLD-POS.

     If NEW-POS is `nil', then `constrain-to-field' uses the value of
     point instead, and moves point to the resulting position as well
     as returning it.

     If OLD-POS is at the boundary of two fields, then the acceptable
     final positions depend on the argument ESCAPE-FROM-EDGE.  If
     ESCAPE-FROM-EDGE is `nil', then NEW-POS must be in the field whose
     `field' property equals what new characters inserted at OLD-POS
     would inherit.  (This depends on the stickiness of the `field'
     property for the characters before and after OLD-POS.)  If
     ESCAPE-FROM-EDGE is non-`nil', NEW-POS can be anywhere in the two
     adjacent fields.  Additionally, if two fields are separated by
     another field with the special value `boundary', then any point
     within this special field is also considered to be "on the
     boundary."

     Commands like `C-a' with no argumemt, that normally move backward
     to a specific kind of location and stay there once there, probably
     should specify `nil' for ESCAPE-FROM-EDGE.  Other motion commands
     that check fields should probably pass `t'.

     If the optional argument ONLY-IN-LINE is non-`nil', and
     constraining NEW-POS in the usual way would move it to a different
     line, NEW-POS is returned unconstrained.  This used in commands
     that move by line, such as `next-line' and `beginning-of-line', so
     that they respect field boundaries only in the case where they can
     still move to the right line.

     If the optional argument INHIBIT-CAPTURE-PROPERTY is non-`nil',
     and OLD-POS has a non-`nil' property of that name, then any field
     boundaries are ignored.

     You can cause `constrain-to-field' to ignore all field boundaries
     (and so never constrain anything) by binding the variable
     `inhibit-field-text-motion' to a non-`nil' value.


File: elisp,  Node: Not Intervals,  Prev: Fields,  Up: Text Properties

32.19.10 Why Text Properties are not Intervals
----------------------------------------------

Some editors that support adding attributes to text in the buffer do so
by letting the user specify "intervals" within the text, and adding the
properties to the intervals.  Those editors permit the user or the
programmer to determine where individual intervals start and end.  We
deliberately provided a different sort of interface in Emacs Lisp to
avoid certain paradoxical behavior associated with text modification.

   If the actual subdivision into intervals is meaningful, that means
you can distinguish between a buffer that is just one interval with a
certain property, and a buffer containing the same text subdivided into
two intervals, both of which have that property.

   Suppose you take the buffer with just one interval and kill part of
the text.  The text remaining in the buffer is one interval, and the
copy in the kill ring (and the undo list) becomes a separate interval.
Then if you yank back the killed text, you get two intervals with the
same properties.  Thus, editing does not preserve the distinction
between one interval and two.

   Suppose we "fix" this problem by coalescing the two intervals when
the text is inserted.  That works fine if the buffer originally was a
single interval.  But suppose instead that we have two adjacent
intervals with the same properties, and we kill the text of one interval
and yank it back.  The same interval-coalescence feature that rescues
the other case causes trouble in this one: after yanking, we have just
one interval.  One again, editing does not preserve the distinction
between one interval and two.

   Insertion of text at the border between intervals also raises
questions that have no satisfactory answer.

   However, it is easy to arrange for editing to behave consistently for
questions of the form, "What are the properties of this character?"  So
we have decided these are the only questions that make sense; we have
not implemented asking questions about where intervals start or end.

   In practice, you can usually use the text property search functions
in place of explicit interval boundaries.  You can think of them as
finding the boundaries of intervals, assuming that intervals are always
coalesced whenever possible.  *Note Property Search::.

   Emacs also provides explicit intervals as a presentation feature; see
*note Overlays::.


File: elisp,  Node: Substitution,  Next: Transposition,  Prev: Text Properties,  Up: Text

32.20 Substituting for a Character Code
=======================================

The following functions replace characters within a specified region
based on their character codes.

 -- Function: subst-char-in-region start end old-char new-char
          &optional noundo
     This function replaces all occurrences of the character OLD-CHAR
     with the character NEW-CHAR in the region of the current buffer
     defined by START and END.

     If NOUNDO is non-`nil', then `subst-char-in-region' does not
     record the change for undo and does not mark the buffer as
     modified.  This was useful for controlling the old selective
     display feature (*note Selective Display::).

     `subst-char-in-region' does not move point and returns `nil'.

          ---------- Buffer: foo ----------
          This is the contents of the buffer before.
          ---------- Buffer: foo ----------

          (subst-char-in-region 1 20 ?i ?X)
               => nil

          ---------- Buffer: foo ----------
          ThXs Xs the contents of the buffer before.
          ---------- Buffer: foo ----------

 -- Command: translate-region start end table
     This function applies a translation table to the characters in the
     buffer between positions START and END.

     The translation table TABLE is a string or a char-table; `(aref
     TABLE OCHAR)' gives the translated character corresponding to
     OCHAR.  If TABLE is a string, any characters with codes larger
     than the length of TABLE are not altered by the translation.

     The return value of `translate-region' is the number of characters
     that were actually changed by the translation.  This does not
     count characters that were mapped into themselves in the
     translation table.


File: elisp,  Node: Registers,  Next: Base 64,  Prev: Transposition,  Up: Text

32.21 Registers
===============

A register is a sort of variable used in Emacs editing that can hold a
variety of different kinds of values.  Each register is named by a
single character.  All ASCII characters and their meta variants (but
with the exception of `C-g') can be used to name registers.  Thus,
there are 255 possible registers.  A register is designated in Emacs
Lisp by the character that is its name.

 -- Variable: register-alist
     This variable is an alist of elements of the form `(NAME .
     CONTENTS)'.  Normally, there is one element for each Emacs
     register that has been used.

     The object NAME is a character (an integer) identifying the
     register.

   The CONTENTS of a register can have several possible types:

a number
     A number stands for itself.  If `insert-register' finds a number
     in the register, it converts the number to decimal.

a marker
     A marker represents a buffer position to jump to.

a string
     A string is text saved in the register.

a rectangle
     A rectangle is represented by a list of strings.

`(WINDOW-CONFIGURATION POSITION)'
     This represents a window configuration to restore in one frame,
     and a position to jump to in the current buffer.

`(FRAME-CONFIGURATION POSITION)'
     This represents a frame configuration to restore, and a position
     to jump to in the current buffer.

(file FILENAME)
     This represents a file to visit; jumping to this value visits file
     FILENAME.

(file-query FILENAME POSITION)
     This represents a file to visit and a position in it; jumping to
     this value visits file FILENAME and goes to buffer position
     POSITION.  Restoring this type of position asks the user for
     confirmation first.

   The functions in this section return unpredictable values unless
otherwise stated.

 -- Function: get-register reg
     This function returns the contents of the register REG, or `nil'
     if it has no contents.

 -- Function: set-register reg value
     This function sets the contents of register REG to VALUE.  A
     register can be set to any value, but the other register functions
     expect only certain data types.  The return value is VALUE.

 -- Command: view-register reg
     This command displays what is contained in register REG.

 -- Command: insert-register reg &optional beforep
     This command inserts contents of register REG into the current
     buffer.

     Normally, this command puts point before the inserted text, and the
     mark after it.  However, if the optional second argument BEFOREP
     is non-`nil', it puts the mark before and point after.  You can
     pass a non-`nil' second argument BEFOREP to this function
     interactively by supplying any prefix argument.

     If the register contains a rectangle, then the rectangle is
     inserted with its upper left corner at point.  This means that
     text is inserted in the current line and underneath it on
     successive lines.

     If the register contains something other than saved text (a
     string) or a rectangle (a list), currently useless things happen.
     This may be changed in the future.


File: elisp,  Node: Transposition,  Next: Registers,  Prev: Substitution,  Up: Text

32.22 Transposition of Text
===========================

This subroutine is used by the transposition commands.

 -- Function: transpose-regions start1 end1 start2 end2 &optional
          leave-markers
     This function exchanges two nonoverlapping portions of the buffer.
     Arguments START1 and END1 specify the bounds of one portion and
     arguments START2 and END2 specify the bounds of the other portion.

     Normally, `transpose-regions' relocates markers with the transposed
     text; a marker previously positioned within one of the two
     transposed portions moves along with that portion, thus remaining
     between the same two characters in their new position.  However,
     if LEAVE-MARKERS is non-`nil', `transpose-regions' does not do
     this--it leaves all markers unrelocated.


File: elisp,  Node: Base 64,  Next: MD5 Checksum,  Prev: Registers,  Up: Text

32.23 Base 64 Encoding
======================

Base 64 code is used in email to encode a sequence of 8-bit bytes as a
longer sequence of ASCII graphic characters.  It is defined in Internet
RFC(1)2045.  This section describes the functions for converting to and
from this code.

 -- Command: base64-encode-region beg end &optional no-line-break
     This function converts the region from BEG to END into base 64
     code.  It returns the length of the encoded text.  An error is
     signaled if a character in the region is multibyte, i.e. in a
     multibyte buffer the region must contain only characters from the
     charsets `ascii', `eight-bit-control' and `eight-bit-graphic'.

     Normally, this function inserts newline characters into the encoded
     text, to avoid overlong lines.  However, if the optional argument
     NO-LINE-BREAK is non-`nil', these newlines are not added, so the
     output is just one long line.

 -- Command: base64-encode-string string &optional no-line-break
     This function converts the string STRING into base 64 code.  It
     returns a string containing the encoded text.  As for
     `base64-encode-region', an error is signaled if a character in the
     string is multibyte.

     Normally, this function inserts newline characters into the encoded
     text, to avoid overlong lines.  However, if the optional argument
     NO-LINE-BREAK is non-`nil', these newlines are not added, so the
     result string is just one long line.

 -- Function: base64-decode-region beg end
     This function converts the region from BEG to END from base 64
     code into the corresponding decoded text.  It returns the length of
     the decoded text.

     The decoding functions ignore newline characters in the encoded
     text.

 -- Function: base64-decode-string string
     This function converts the string STRING from base 64 code into
     the corresponding decoded text.  It returns a unibyte string
     containing the decoded text.

     The decoding functions ignore newline characters in the encoded
     text.

   ---------- Footnotes ----------

   (1) An RFC, an acronym for "Request for Comments", is a numbered
Internet informational document describing a standard.  RFCs are
usually written by technical experts acting on their own initiative,
and are traditionally written in a pragmatic, experience-driven manner.


File: elisp,  Node: MD5 Checksum,  Next: Atomic Changes,  Prev: Base 64,  Up: Text

32.24 MD5 Checksum
==================

MD5 cryptographic checksums, or "message digests", are 128-bit
"fingerprints" of a document or program.  They are used to verify that
you have an exact and unaltered copy of the data.  The algorithm to
calculate the MD5 message digest is defined in Internet RFC(1)1321.
This section describes the Emacs facilities for computing message
digests.

 -- Function: md5 object &optional start end coding-system noerror
     This function returns the MD5 message digest of OBJECT, which
     should be a buffer or a string.

     The two optional arguments START and END are character positions
     specifying the portion of OBJECT to compute the message digest
     for.  If they are `nil' or omitted, the digest is computed for the
     whole of OBJECT.

     The function `md5' does not compute the message digest directly
     from the internal Emacs representation of the text (*note Text
     Representations::).  Instead, it encodes the text using a coding
     system, and computes the message digest from the encoded text.  The
     optional fourth argument CODING-SYSTEM specifies which coding
     system to use for encoding the text.  It should be the same coding
     system that you used to read the text, or that you used or will use
     when saving or sending the text.  *Note Coding Systems::, for more
     information about coding systems.

     If CODING-SYSTEM is `nil' or omitted, the default depends on
     OBJECT.  If OBJECT is a buffer, the default for CODING-SYSTEM is
     whatever coding system would be chosen by default for writing this
     text into a file.  If OBJECT is a string, the user's most
     preferred coding system (*note prefer-coding-system:
     (emacs)Recognize Coding.) is used.

     Normally, `md5' signals an error if the text can't be encoded
     using the specified or chosen coding system.  However, if NOERROR
     is non-`nil', it silently uses `raw-text' coding instead.

   ---------- Footnotes ----------

   (1) For an explanation of what is an RFC, see the footnote in *note
Base 64::.


File: elisp,  Node: Atomic Changes,  Next: Change Hooks,  Prev: MD5 Checksum,  Up: Text

32.25 Atomic Change Groups
==========================

In data base terminology, an "atomic" change is an indivisible
change--it can succeed entirely or it can fail entirely, but it cannot
partly succeed.  A Lisp program can make a series of changes to one or
several buffers as an "atomic change group", meaning that either the
entire series of changes will be installed in their buffers or, in case
of an error, none of them will be.

   To do this for one buffer, the one already current, simply write a
call to `atomic-change-group' around the code that makes the changes,
like this:

     (atomic-change-group
       (insert foo)
       (delete-region x y))

If an error (or other nonlocal exit) occurs inside the body of
`atomic-change-group', it unmakes all the changes in that buffer that
were during the execution of the body.  This kind of change group has
no effect on any other buffers--any such changes remain.

   If you need something more sophisticated, such as to make changes in
various buffers constitute one atomic group, you must directly call
lower-level functions that `atomic-change-group' uses.

 -- Function: prepare-change-group &optional buffer
     This function sets up a change group for buffer BUFFER, which
     defaults to the current buffer.  It returns a "handle" that
     represents the change group.  You must use this handle to activate
     the change group and subsequently to finish it.

   To use the change group, you must "activate" it.  You must do this
before making any changes in the text of BUFFER.

 -- Function: activate-change-group handle
     This function activates the change group that HANDLE designates.

   After you activate the change group, any changes you make in that
buffer become part of it.  Once you have made all the desired changes
in the buffer, you must "finish" the change group.  There are two ways
to do this: you can either accept (and finalize) all the changes, or
cancel them all.

 -- Function: accept-change-group handle
     This function accepts all the changes in the change group
     specified by HANDLE, making them final.

 -- Function: cancel-change-group handle
     This function cancels and undoes all the changes in the change
     group specified by HANDLE.

   Your code should use `unwind-protect' to make sure the group is
always finished.  The call to `activate-change-group' should be inside
the `unwind-protect', in case the user types `C-g' just after it runs.
(This is one reason why `prepare-change-group' and
`activate-change-group' are separate functions, because normally you
would call `prepare-change-group' before the start of that
`unwind-protect'.)  Once you finish the group, don't use the handle
again--in particular, don't try to finish the same group twice.

   To make a multibuffer change group, call `prepare-change-group' once
for each buffer you want to cover, then use `nconc' to combine the
returned values, like this:

     (nconc (prepare-change-group buffer-1)
            (prepare-change-group buffer-2))

   You can then activate the multibuffer change group with a single call
to `activate-change-group', and finish it with a single call to
`accept-change-group' or `cancel-change-group'.

   Nested use of several change groups for the same buffer works as you
would expect.  Non-nested use of change groups for the same buffer will
get Emacs confused, so don't let it happen; the first change group you
start for any given buffer should be the last one finished.


File: elisp,  Node: Change Hooks,  Prev: Atomic Changes,  Up: Text

32.26 Change Hooks
==================

These hook variables let you arrange to take notice of all changes in
all buffers (or in a particular buffer, if you make them buffer-local).
See also *note Special Properties::, for how to detect changes to
specific parts of the text.

   The functions you use in these hooks should save and restore the
match data if they do anything that uses regular expressions;
otherwise, they will interfere in bizarre ways with the editing
operations that call them.

 -- Variable: before-change-functions
     This variable holds a list of functions to call before any buffer
     modification.  Each function gets two arguments, the beginning and
     end of the region that is about to change, represented as
     integers.  The buffer that is about to change is always the
     current buffer.

 -- Variable: after-change-functions
     This variable holds a list of functions to call after any buffer
     modification.  Each function receives three arguments: the
     beginning and end of the region just changed, and the length of
     the text that existed before the change.  All three arguments are
     integers.  The buffer that's about to change is always the current
     buffer.

     The length of the old text is the difference between the buffer
     positions before and after that text as it was before the change.
     As for the changed text, its length is simply the difference
     between the first two arguments.

   Output of messages into the `*Messages*' buffer does not call these
functions.

 -- Macro: combine-after-change-calls body...
     The macro executes BODY normally, but arranges to call the
     after-change functions just once for a series of several
     changes--if that seems safe.

     If a program makes several text changes in the same area of the
     buffer, using the macro `combine-after-change-calls' around that
     part of the program can make it run considerably faster when
     after-change hooks are in use.  When the after-change hooks are
     ultimately called, the arguments specify a portion of the buffer
     including all of the changes made within the
     `combine-after-change-calls' body.

     *Warning:* You must not alter the values of
     `after-change-functions' within the body of a
     `combine-after-change-calls' form.

     *Warning:* if the changes you combine occur in widely scattered
     parts of the buffer, this will still work, but it is not advisable,
     because it may lead to inefficient behavior for some change hook
     functions.

 -- Variable: first-change-hook
     This variable is a normal hook that is run whenever a buffer is
     changed that was previously in the unmodified state.

 -- Variable: inhibit-modification-hooks
     If this variable is non-`nil', all of the change hooks are
     disabled; none of them run.  This affects all the hook variables
     described above in this section, as well as the hooks attached to
     certain special text properties (*note Special Properties::) and
     overlay properties (*note Overlay Properties::).

     Also, this variable is bound to non-`nil' while running those same
     hook variables, so that by default modifying the buffer from a
     modification hook does not cause other modification hooks to be
     run.  If you do want modification hooks to be run in a particular
     piece of code that is itself run from a modification hook, then
     rebind locally `inhibit-modification-hooks' to `nil'.


File: elisp,  Node: Non-ASCII Characters,  Next: Searching and Matching,  Prev: Text,  Up: Top

33 Non-ASCII Characters
***********************

This chapter covers the special issues relating to characters and how
they are stored in strings and buffers.

* Menu:

* Text Representations::    How Emacs represents text.
* Converting Representations::  Converting unibyte to multibyte and vice versa.
* Selecting a Representation::  Treating a byte sequence as unibyte or multi.
* Character Codes::         How unibyte and multibyte relate to
                                codes of individual characters.
* Character Properties::    Character attributes that define their
                                behavior and handling.
* Character Sets::          The space of possible character codes
                                is divided into various character sets.
* Scanning Charsets::       Which character sets are used in a buffer?
* Translation of Characters::   Translation tables are used for conversion.
* Coding Systems::          Coding systems are conversions for saving files.
* Input Methods::           Input methods allow users to enter various
                                non-ASCII characters without special keyboards.
* Locales::                 Interacting with the POSIX locale.


File: elisp,  Node: Text Representations,  Next: Converting Representations,  Up: Non-ASCII Characters

33.1 Text Representations
=========================

Emacs buffers and strings support a large repertoire of characters from
many different scripts, allowing users to type and display text in
almost any known written language.

   To support this multitude of characters and scripts, Emacs closely
follows the "Unicode Standard".  The Unicode Standard assigns a unique
number, called a "codepoint", to each and every character.  The range
of codepoints defined by Unicode, or the Unicode "codespace", is
`0..#x10FFFF' (in hexadecimal notation), inclusive.  Emacs extends this
range with codepoints in the range `#x110000..#x3FFFFF', which it uses
for representing characters that are not unified with Unicode and "raw
8-bit bytes" that cannot be interpreted as characters.  Thus, a
character codepoint in Emacs is a 22-bit integer number.

   To conserve memory, Emacs does not hold fixed-length 22-bit numbers
that are codepoints of text characters within buffers and strings.
Rather, Emacs uses a variable-length internal representation of
characters, that stores each character as a sequence of 1 to 5 8-bit
bytes, depending on the magnitude of its codepoint(1).  For example,
any ASCII character takes up only 1 byte, a Latin-1 character takes up
2 bytes, etc.  We call this representation of text "multibyte".

   Outside Emacs, characters can be represented in many different
encodings, such as ISO-8859-1, GB-2312, Big-5, etc.  Emacs converts
between these external encodings and its internal representation, as
appropriate, when it reads text into a buffer or a string, or when it
writes text to a disk file or passes it to some other process.

   Occasionally, Emacs needs to hold and manipulate encoded text or
binary non-text data in its buffers or strings.  For example, when
Emacs visits a file, it first reads the file's text verbatim into a
buffer, and only then converts it to the internal representation.
Before the conversion, the buffer holds encoded text.

   Encoded text is not really text, as far as Emacs is concerned, but
rather a sequence of raw 8-bit bytes.  We call buffers and strings that
hold encoded text "unibyte" buffers and strings, because Emacs treats
them as a sequence of individual bytes.  Usually, Emacs displays
unibyte buffers and strings as octal codes such as `\237'.  We
recommend that you never use unibyte buffers and strings except for
manipulating encoded text or binary non-text data.

   In a buffer, the buffer-local value of the variable
`enable-multibyte-characters' specifies the representation used.  The
representation for a string is determined and recorded in the string
when the string is constructed.

 -- Variable: enable-multibyte-characters
     This variable specifies the current buffer's text representation.
     If it is non-`nil', the buffer contains multibyte text; otherwise,
     it contains unibyte encoded text or binary non-text data.

     You cannot set this variable directly; instead, use the function
     `set-buffer-multibyte' to change a buffer's representation.

     The `--unibyte' command line option does its job by setting the
     default value to `nil' early in startup.

 -- Function: position-bytes position
     Buffer positions are measured in character units.  This function
     returns the byte-position corresponding to buffer position
     POSITION in the current buffer.  This is 1 at the start of the
     buffer, and counts upward in bytes.  If POSITION is out of range,
     the value is `nil'.

 -- Function: byte-to-position byte-position
     Return the buffer position, in character units, corresponding to
     given BYTE-POSITION in the current buffer.  If BYTE-POSITION is
     out of range, the value is `nil'.  In a multibyte buffer, an
     arbitrary value of BYTE-POSITION can be not at character boundary,
     but inside a multibyte sequence representing a single character;
     in this case, this function returns the buffer position of the
     character whose multibyte sequence includes BYTE-POSITION.  In
     other words, the value does not change for all byte positions that
     belong to the same character.

 -- Function: multibyte-string-p string
     Return `t' if STRING is a multibyte string, `nil' otherwise.

 -- Function: string-bytes string
     This function returns the number of bytes in STRING.  If STRING is
     a multibyte string, this can be greater than `(length STRING)'.

 -- Function: unibyte-string &rest bytes
     This function concatenates all its argument BYTES and makes the
     result a unibyte string.

   ---------- Footnotes ----------

   (1) This internal representation is based on one of the encodings
defined by the Unicode Standard, called "UTF-8", for representing any
Unicode codepoint, but Emacs extends UTF-8 to represent the additional
codepoints it uses for raw 8-bit bytes and characters not unified with
Unicode.


File: elisp,  Node: Converting Representations,  Next: Selecting a Representation,  Prev: Text Representations,  Up: Non-ASCII Characters

33.2 Converting Text Representations
====================================

Emacs can convert unibyte text to multibyte; it can also convert
multibyte text to unibyte, provided that the multibyte text contains
only ASCII and 8-bit raw bytes.  In general, these conversions happen
when inserting text into a buffer, or when putting text from several
strings together in one string.  You can also explicitly convert a
string's contents to either representation.

   Emacs chooses the representation for a string based on the text from
which it is constructed.  The general rule is to convert unibyte text
to multibyte text when combining it with other multibyte text, because
the multibyte representation is more general and can hold whatever
characters the unibyte text has.

   When inserting text into a buffer, Emacs converts the text to the
buffer's representation, as specified by `enable-multibyte-characters'
in that buffer.  In particular, when you insert multibyte text into a
unibyte buffer, Emacs converts the text to unibyte, even though this
conversion cannot in general preserve all the characters that might be
in the multibyte text.  The other natural alternative, to convert the
buffer contents to multibyte, is not acceptable because the buffer's
representation is a choice made by the user that cannot be overridden
automatically.

   Converting unibyte text to multibyte text leaves ASCII characters
unchanged, and converts bytes with codes 128 through 159 to the
multibyte representation of raw eight-bit bytes.

   Converting multibyte text to unibyte converts all ASCII and
eight-bit characters to their single-byte form, but loses information
for non-ASCII characters by discarding all but the low 8 bits of each
character's codepoint.  Converting unibyte text to multibyte and back
to unibyte reproduces the original unibyte text.

   The next two functions either return the argument STRING, or a newly
created string with no text properties.

 -- Function: string-to-multibyte string
     This function returns a multibyte string containing the same
     sequence of characters as STRING.  If STRING is a multibyte string,
     it is returned unchanged.  The function assumes that STRING
     includes only ASCII characters and raw 8-bit bytes; the latter are
     converted to their multibyte representation corresponding to the
     codepoints `#x3FFF80' through `#x3FFFFF', inclusive (*note
     codepoints: Text Representations.).

 -- Function: string-to-unibyte string
     This function returns a unibyte string containing the same
     sequence of characters as STRING.  It signals an error if STRING
     contains a non-ASCII character.  If STRING is a unibyte string, it
     is returned unchanged.  Use this function for STRING arguments
     that contain only ASCII and eight-bit characters.

 -- Function: multibyte-char-to-unibyte char
     This converts the multibyte character CHAR to a unibyte character,
     and returns that character.  If CHAR is neither ASCII nor
     eight-bit, the function returns -1.

 -- Function: unibyte-char-to-multibyte char
     This convert the unibyte character CHAR to a multibyte character,
     assuming CHAR is either ASCII or raw 8-bit byte.


File: elisp,  Node: Selecting a Representation,  Next: Character Codes,  Prev: Converting Representations,  Up: Non-ASCII Characters

33.3 Selecting a Representation
===============================

Sometimes it is useful to examine an existing buffer or string as
multibyte when it was unibyte, or vice versa.

 -- Function: set-buffer-multibyte multibyte
     Set the representation type of the current buffer.  If MULTIBYTE
     is non-`nil', the buffer becomes multibyte.  If MULTIBYTE is
     `nil', the buffer becomes unibyte.

     This function leaves the buffer contents unchanged when viewed as a
     sequence of bytes.  As a consequence, it can change the contents
     viewed as characters; for instance, a sequence of three bytes
     which is treated as one character in multibyte representation will
     count as three characters in unibyte representation.  Eight-bit
     characters representing raw bytes are an exception.  They are
     represented by one byte in a unibyte buffer, but when the buffer
     is set to multibyte, they are converted to two-byte sequences, and
     vice versa.

     This function sets `enable-multibyte-characters' to record which
     representation is in use.  It also adjusts various data in the
     buffer (including overlays, text properties and markers) so that
     they cover the same text as they did before.

     You cannot use `set-buffer-multibyte' on an indirect buffer,
     because indirect buffers always inherit the representation of the
     base buffer.

 -- Function: string-as-unibyte string
     If STRING is already a unibyte string, this function returns
     STRING itself.  Otherwise, it returns a new string with the same
     bytes as STRING, but treating each byte as a separate character
     (so that the value may have more characters than STRING); as an
     exception, each eight-bit character representing a raw byte is
     converted into a single byte.  The newly-created string contains no
     text properties.

 -- Function: string-as-multibyte string
     If STRING is a multibyte string, this function returns STRING
     itself.  Otherwise, it returns a new string with the same bytes as
     STRING, but treating each multibyte sequence as one character.
     This means that the value may have fewer characters than STRING
     has.  If a byte sequence in STRING is invalid as a multibyte
     representation of a single character, each byte in the sequence is
     treated as a raw 8-bit byte.  The newly-created string contains no
     text properties.


File: elisp,  Node: Character Codes,  Next: Character Properties,  Prev: Selecting a Representation,  Up: Non-ASCII Characters

33.4 Character Codes
====================

The unibyte and multibyte text representations use different character
codes.  The valid character codes for unibyte representation range from
0 to `#xFF' (255)--the values that can fit in one byte.  The valid
character codes for multibyte representation range from 0 to
`#x3FFFFF'.  In this code space, values 0 through `#x7F' (127) are for
ASCII characters, and values `#x80' (128) through `#x3FFF7F' (4194175)
are for non-ASCII characters.

   Emacs character codes are a superset of the Unicode standard.
Values 0 through `#x10FFFF' (1114111) correspond to Unicode characters
of the same codepoint; values `#x110000' (1114112) through `#x3FFF7F'
(4194175) represent characters that are not unified with Unicode; and
values `#x3FFF80' (4194176) through `#x3FFFFF' (4194303) represent
eight-bit raw bytes.

 -- Function: characterp charcode
     This returns `t' if CHARCODE is a valid character, and `nil'
     otherwise.

          (characterp 65)
               => t
          (characterp 4194303)
               => t
          (characterp 4194304)
               => nil

 -- Function: max-char
     This function returns the largest value that a valid character
     codepoint can have.

          (characterp (max-char))
               => t
          (characterp (1+ (max-char)))
               => nil

 -- Function: get-byte &optional pos string
     This function returns the byte at character position POS in the
     current buffer.  If the current buffer is unibyte, this is
     literally the byte at that position.  If the buffer is multibyte,
     byte values of ASCII characters are the same as character
     codepoints, whereas eight-bit raw bytes are converted to their
     8-bit codes.  The function signals an error if the character at
     POS is non-ASCII.

     The optional argument STRING means to get a byte value from that
     string instead of the current buffer.


File: elisp,  Node: Character Properties,  Next: Character Sets,  Prev: Character Codes,  Up: Non-ASCII Characters

33.5 Character Properties
=========================

A "character property" is a named attribute of a character that
specifies how the character behaves and how it should be handled during
text processing and display.  Thus, character properties are an
important part of specifying the character's semantics.

   On the whole, Emacs follows the Unicode Standard in its
implementation of character properties.  In particular, Emacs supports
the Unicode Character Property Model
(http://www.unicode.org/reports/tr23/), and the Emacs character
property database is derived from the Unicode Character Database (UCD).
See the Character Properties chapter of the Unicode Standard
(http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf), for a detailed
description of Unicode character properties and their meaning.  This
section assumes you are already familiar with that chapter of the
Unicode Standard, and want to apply that knowledge to Emacs Lisp
programs.

   In Emacs, each property has a name, which is a symbol, and a set of
possible values, whose types depend on the property; if a character
does not have a certain property, the value is `nil'.  As a general
rule, the names of character properties in Emacs are produced from the
corresponding Unicode properties by downcasing them and replacing each
`_' character with a dash `-'.  For example,
`Canonical_Combining_Class' becomes `canonical-combining-class'.
However, sometimes we shorten the names to make their use easier.

   Here is the full list of value types for all the character
properties that Emacs knows about:

`name'
     This property corresponds to the Unicode `Name' property.  The
     value is a string consisting of upper-case Latin letters A to Z,
     digits, spaces, and hyphen `-' characters.

`general-category'
     This property corresponds to the Unicode `General_Category'
     property.  The value is a symbol whose name is a 2-letter
     abbreviation of the character's classification.

`canonical-combining-class'
     Corresponds to the Unicode `Canonical_Combining_Class' property.
     The value is an integer number.

`bidi-class'
     Corresponds to the Unicode `Bidi_Class' property.  The value is a
     symbol whose name is the Unicode "directional type" of the
     character.

`decomposition'
     Corresponds to the Unicode `Decomposition_Type' and
     `Decomposition_Value' properties.  The value is a list, whose
     first element may be a symbol representing a compatibility
     formatting tag, such as `small'(1); the other elements are
     characters that give the compatibility decomposition sequence of
     this character.

`decimal-digit-value'
     Corresponds to the Unicode `Numeric_Value' property for characters
     whose `Numeric_Type' is `Digit'.  The value is an integer number.

`digit'
     Corresponds to the Unicode `Numeric_Value' property for characters
     whose `Numeric_Type' is `Decimal'.  The value is an integer
     number.  Examples of such characters include compatibility
     subscript and superscript digits, for which the value is the
     corresponding number.

`numeric-value'
     Corresponds to the Unicode `Numeric_Value' property for characters
     whose `Numeric_Type' is `Numeric'.  The value of this property is
     an integer or a floating-point number.  Examples of characters
     that have this property include fractions, subscripts,
     superscripts, Roman numerals, currency numerators, and encircled
     numbers.  For example, the value of this property for the character
     `U+2155' (VULGAR FRACTION ONE FIFTH) is `0.2'.

`mirrored'
     Corresponds to the Unicode `Bidi_Mirrored' property.  The value of
     this property is a symbol, either `Y' or `N'.

`old-name'
     Corresponds to the Unicode `Unicode_1_Name' property.  The value
     is a string.

`iso-10646-comment'
     Corresponds to the Unicode `ISO_Comment' property.  The value is a
     string.

`uppercase'
     Corresponds to the Unicode `Simple_Uppercase_Mapping' property.
     The value of this property is a single character.

`lowercase'
     Corresponds to the Unicode `Simple_Lowercase_Mapping' property.
     The value of this property is a single character.

`titlecase'
     Corresponds to the Unicode `Simple_Titlecase_Mapping' property.
     "Title case" is a special form of a character used when the first
     character of a word needs to be capitalized.  The value of this
     property is a single character.

 -- Function: get-char-code-property char propname
     This function returns the value of CHAR's PROPNAME property.

          (get-char-code-property ?  'general-category)
               => Zs
          (get-char-code-property ?1  'general-category)
               => Nd
          (get-char-code-property ?\u2084 'digit-value) ; subscript 4
               => 4
          (get-char-code-property ?\u2155 'numeric-value) ; one fifth
               => 1/5
          (get-char-code-property ?\u2163 'numeric-value) ; Roman IV
               => \4

 -- Function: char-code-property-description prop value
     This function returns the description string of property PROP's
     VALUE, or `nil' if VALUE has no description.

          (char-code-property-description 'general-category 'Zs)
               => "Separator, Space"
          (char-code-property-description 'general-category 'Nd)
               => "Number, Decimal Digit"
          (char-code-property-description 'numeric-value '1/5)
               => nil

 -- Function: put-char-code-property char propname value
     This function stores VALUE as the value of the property PROPNAME
     for the character CHAR.

 -- Variable: char-script-table
     The value of this variable is a char-table (*note Char-Tables::)
     that specifies, for each character, a symbol whose name is the
     script to which the character belongs, according to the Unicode
     Standard classification of the Unicode code space into
     script-specific blocks.  This char-table has a single extra slot
     whose value is the list of all script symbols.

 -- Variable: char-width-table
     The value of this variable is a char-table that specifies the
     width of each character in columns that it will occupy on the
     screen.

 -- Variable: printable-chars
     The value of this variable is a char-table that specifies, for each
     character, whether it is printable or not.  That is, if evaluating
     `(aref printable-chars char)' results in `t', the character is
     printable, and if it results in `nil', it is not.

   ---------- Footnotes ----------

   (1) Note that the Unicode spec writes these tag names inside `<..>'
brackets.  The tag names in Emacs do not include the brackets; e.g.,
Unicode specifies `<small>' where Emacs uses `small'.


File: elisp,  Node: Character Sets,  Next: Scanning Charsets,  Prev: Character Properties,  Up: Non-ASCII Characters

33.6 Character Sets
===================

An Emacs "character set", or "charset", is a set of characters in which
each character is assigned a numeric code point.  (The Unicode Standard
calls this a "coded character set".)  Each Emacs charset has a name
which is a symbol.  A single character can belong to any number of
different character sets, but it will generally have a different code
point in each charset.  Examples of character sets include `ascii',
`iso-8859-1', `greek-iso8859-7', and `windows-1255'.  The code point
assigned to a character in a charset is usually different from its code
point used in Emacs buffers and strings.

   Emacs defines several special character sets.  The character set
`unicode' includes all the characters whose Emacs code points are in
the range `0..#x10FFFF'.  The character set `emacs' includes all ASCII
and non-ASCII characters.  Finally, the `eight-bit' charset includes
the 8-bit raw bytes; Emacs uses it to represent raw bytes encountered
in text.

 -- Function: charsetp object
     Returns `t' if OBJECT is a symbol that names a character set,
     `nil' otherwise.

 -- Variable: charset-list
     The value is a list of all defined character set names.

 -- Function: charset-priority-list &optional highestp
     This functions returns a list of all defined character sets
     ordered by their priority.  If HIGHESTP is non-`nil', the function
     returns a single character set of the highest priority.

 -- Function: set-charset-priority &rest charsets
     This function makes CHARSETS the highest priority character sets.

 -- Function: char-charset character &optional restriction
     This function returns the name of the character set of highest
     priority that CHARACTER belongs to.  ASCII characters are an
     exception: for them, this function always returns `ascii'.

     If RESTRICTION is non-`nil', it should be a list of charsets to
     search.  Alternatively, it can be a coding system, in which case
     the returned charset must be supported by that coding system
     (*note Coding Systems::).

 -- Function: charset-plist charset
     This function returns the property list of the character set
     CHARSET.  Although CHARSET is a symbol, this is not the same as
     the property list of that symbol.  Charset properties include
     important information about the charset, such as its documentation
     string, short name, etc.

 -- Function: put-charset-property charset propname value
     This function sets the PROPNAME property of CHARSET to the given
     VALUE.

 -- Function: get-charset-property charset propname
     This function returns the value of CHARSETs property PROPNAME.

 -- Command: list-charset-chars charset
     This command displays a list of characters in the character set
     CHARSET.

   Emacs can convert between its internal representation of a character
and the character's codepoint in a specific charset.  The following two
functions support these conversions.

 -- Function: decode-char charset code-point
     This function decodes a character that is assigned a CODE-POINT in
     CHARSET, to the corresponding Emacs character, and returns it.  If
     CHARSET doesn't contain a character of that code point, the value
     is `nil'.  If CODE-POINT doesn't fit in a Lisp integer (*note
     most-positive-fixnum: Integer Basics.), it can be specified as a
     cons cell `(HIGH . LOW)', where LOW are the lower 16 bits of the
     value and HIGH are the high 16 bits.

 -- Function: encode-char char charset
     This function returns the code point assigned to the character
     CHAR in CHARSET.  If the result does not fit in a Lisp integer, it
     is returned as a cons cell `(HIGH . LOW)' that fits the second
     argument of `decode-char' above.  If CHARSET doesn't have a
     codepoint for CHAR, the value is `nil'.

   The following function comes in handy for applying a certain
function to all or part of the characters in a charset:

 -- Function: map-charset-chars function charset &optional arg
          from-code to-code
     Call FUNCTION for characters in CHARSET.  FUNCTION is called with
     two arguments.  The first one is a cons cell `(FROM .  TO)', where
     FROM and TO indicate a range of characters contained in charset.
     The second argument passed to FUNCTION is ARG.

     By default, the range of codepoints passed to FUNCTION includes
     all the characters in CHARSET, but optional arguments FROM-CODE
     and TO-CODE limit that to the range of characters between these
     two codepoints of CHARSET.  If either of them is `nil', it
     defaults to the first or last codepoint of CHARSET, respectively.


File: elisp,  Node: Scanning Charsets,  Next: Translation of Characters,  Prev: Character Sets,  Up: Non-ASCII Characters

33.7 Scanning for Character Sets
================================

Sometimes it is useful to find out which character set a particular
character belongs to.  One use for this is in determining which coding
systems (*note Coding Systems::) are capable of representing all of the
text in question; another is to determine the font(s) for displaying
that text.

 -- Function: charset-after &optional pos
     This function returns the charset of highest priority containing
     the character at position POS in the current buffer.  If POS is
     omitted or `nil', it defaults to the current value of point.  If
     POS is out of range, the value is `nil'.

 -- Function: find-charset-region beg end &optional translation
     This function returns a list of the character sets of highest
     priority that contain characters in the current buffer between
     positions BEG and END.

     The optional argument TRANSLATION specifies a translation table to
     use for scanning the text (*note Translation of Characters::).  If
     it is non-`nil', then each character in the region is translated
     through this table, and the value returned describes the translated
     characters instead of the characters actually in the buffer.

 -- Function: find-charset-string string &optional translation
     This function returns a list of character sets of highest priority
     that contain characters in STRING.  It is just like
     `find-charset-region', except that it applies to the contents of
     STRING instead of part of the current buffer.


File: elisp,  Node: Translation of Characters,  Next: Coding Systems,  Prev: Scanning Charsets,  Up: Non-ASCII Characters

33.8 Translation of Characters
==============================

A "translation table" is a char-table (*note Char-Tables::) that
specifies a mapping of characters into characters.  These tables are
used in encoding and decoding, and for other purposes.  Some coding
systems specify their own particular translation tables; there are also
default translation tables which apply to all other coding systems.

   A translation table has two extra slots.  The first is either `nil'
or a translation table that performs the reverse translation; the
second is the maximum number of characters to look up for translating
sequences of characters (see the description of
`make-translation-table-from-alist' below).

 -- Function: make-translation-table &rest translations
     This function returns a translation table based on the argument
     TRANSLATIONS.  Each element of TRANSLATIONS should be a list of
     elements of the form `(FROM . TO)'; this says to translate the
     character FROM into TO.

     The arguments and the forms in each argument are processed in
     order, and if a previous form already translates TO to some other
     character, say TO-ALT, FROM is also translated to TO-ALT.

   During decoding, the translation table's translations are applied to
the characters that result from ordinary decoding.  If a coding system
has the property `:decode-translation-table', that specifies the
translation table to use, or a list of translation tables to apply in
sequence.  (This is a property of the coding system, as returned by
`coding-system-get', not a property of the symbol that is the coding
system's name.  *Note Basic Concepts of Coding Systems: Coding System
Basics.)  Finally, if `standard-translation-table-for-decode' is
non-`nil', the resulting characters are translated by that table.

   During encoding, the translation table's translations are applied to
the characters in the buffer, and the result of translation is actually
encoded.  If a coding system has property `:encode-translation-table',
that specifies the translation table to use, or a list of translation
tables to apply in sequence.  In addition, if the variable
`standard-translation-table-for-encode' is non-`nil', it specifies the
translation table to use for translating the result.

 -- Variable: standard-translation-table-for-decode
     This is the default translation table for decoding.  If a coding
     systems specifies its own translation tables, the table that is the
     value of this variable, if non-`nil', is applied after them.

 -- Variable: standard-translation-table-for-encode
     This is the default translation table for encoding.  If a coding
     systems specifies its own translation tables, the table that is the
     value of this variable, if non-`nil', is applied after them.

 -- Variable: translation-table-for-input
     Self-inserting characters are translated through this translation
     table before they are inserted.  Search commands also translate
     their input through this table, so they can compare more reliably
     with what's in the buffer.

     This variable automatically becomes buffer-local when set.

 -- Function: make-translation-table-from-vector vec
     This function returns a translation table made from VEC that is an
     array of 256 elements to map bytes (values 0 through #xFF) to
     characters.  Elements may be `nil' for untranslated bytes.  The
     returned table has a translation table for reverse mapping in the
     first extra slot, and the value `1' in the second extra slot.

     This function provides an easy way to make a private coding system
     that maps each byte to a specific character.  You can specify the
     returned table and the reverse translation table using the
     properties `:decode-translation-table' and
     `:encode-translation-table' respectively in the PROPS argument to
     `define-coding-system'.

 -- Function: make-translation-table-from-alist alist
     This function is similar to `make-translation-table' but returns a
     complex translation table rather than a simple one-to-one mapping.
     Each element of ALIST is of the form `(FROM . TO)', where FROM and
     TO are either characters or vectors specifying a sequence of
     characters.  If FROM is a character, that character is translated
     to TO (i.e. to a character or a character sequence).  If FROM is a
     vector of characters, that sequence is translated to TO.  The
     returned table has a translation table for reverse mapping in the
     first extra slot, and the maximum length of all the FROM character
     sequences in the second extra slot.


File: elisp,  Node: Coding Systems,  Next: Input Methods,  Prev: Translation of Characters,  Up: Non-ASCII Characters

33.9 Coding Systems
===================

When Emacs reads or writes a file, and when Emacs sends text to a
subprocess or receives text from a subprocess, it normally performs
character code conversion and end-of-line conversion as specified by a
particular "coding system".

   How to define a coding system is an arcane matter, and is not
documented here.

* Menu:

* Coding System Basics::        Basic concepts.
* Encoding and I/O::            How file I/O functions handle coding systems.
* Lisp and Coding Systems::     Functions to operate on coding system names.
* User-Chosen Coding Systems::  Asking the user to choose a coding system.
* Default Coding Systems::      Controlling the default choices.
* Specifying Coding Systems::   Requesting a particular coding system
                                    for a single file operation.
* Explicit Encoding::           Encoding or decoding text without doing I/O.
* Terminal I/O Encoding::       Use of encoding for terminal I/O.
* MS-DOS File Types::           How DOS "text" and "binary" files
                                    relate to coding systems.


File: elisp,  Node: Coding System Basics,  Next: Encoding and I/O,  Up: Coding Systems

33.9.1 Basic Concepts of Coding Systems
---------------------------------------

"Character code conversion" involves conversion between the internal
representation of characters used inside Emacs and some other encoding.
Emacs supports many different encodings, in that it can convert to and
from them.  For example, it can convert text to or from encodings such
as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and several variants of
ISO 2022.  In some cases, Emacs supports several alternative encodings
for the same characters; for example, there are three coding systems
for the Cyrillic (Russian) alphabet: ISO, Alternativnyj, and KOI8.

   Every coding system specifies a particular set of character code
conversions, but the coding system `undecided' is special: it leaves
the choice unspecified, to be chosen heuristically for each file, based
on the file's data.

   In general, a coding system doesn't guarantee roundtrip identity:
decoding a byte sequence using coding system, then encoding the
resulting text in the same coding system, can produce a different byte
sequence.  But some coding systems do guarantee that the byte sequence
will be the same as what you originally decoded.  Here are a few
examples:

     iso-8859-1, utf-8, big5, shift_jis, euc-jp

   Encoding buffer text and then decoding the result can also fail to
reproduce the original text.  For instance, if you encode a character
with a coding system which does not support that character, the result
is unpredictable, and thus decoding it using the same coding system may
produce a different text.  Currently, Emacs can't report errors that
result from encoding unsupported characters.

   "End of line conversion" handles three different conventions used on
various systems for representing end of line in files.  The Unix
convention, used on GNU and Unix systems, is to use the linefeed
character (also called newline).  The DOS convention, used on
MS-Windows and MS-DOS systems, is to use a carriage-return and a
linefeed at the end of a line.  The Mac convention is to use just
carriage-return.

   "Base coding systems" such as `latin-1' leave the end-of-line
conversion unspecified, to be chosen based on the data.  "Variant
coding systems" such as `latin-1-unix', `latin-1-dos' and `latin-1-mac'
specify the end-of-line conversion explicitly as well.  Most base
coding systems have three corresponding variants whose names are formed
by adding `-unix', `-dos' and `-mac'.

   The coding system `raw-text' is special in that it prevents
character code conversion, and causes the buffer visited with this
coding system to be a unibyte buffer.  For historical reasons, you can
save both unibyte and multibyte text with this coding system.  When you
use `raw-text' to encode multibyte text, it does perform one character
code conversion: it converts eight-bit characters to their single-byte
external representation.  `raw-text' does not specify the end-of-line
conversion, allowing that to be determined as usual by the data, and
has the usual three variants which specify the end-of-line conversion.

   `no-conversion' (and its alias `binary') is equivalent to
`raw-text-unix': it specifies no conversion of either character codes
or end-of-line.

   The coding system `utf-8-emacs' specifies that the data is
represented in the internal Emacs encoding (*note Text
Representations::).  This is like `raw-text' in that no code conversion
happens, but different in that the result is multibyte data.  The name
`emacs-internal' is an alias for `utf-8-emacs'.

 -- Function: coding-system-get coding-system property
     This function returns the specified property of the coding system
     CODING-SYSTEM.  Most coding system properties exist for internal
     purposes, but one that you might find useful is `:mime-charset'.
     That property's value is the name used in MIME for the character
     coding which this coding system can read and write.  Examples:

          (coding-system-get 'iso-latin-1 :mime-charset)
               => iso-8859-1
          (coding-system-get 'iso-2022-cn :mime-charset)
               => iso-2022-cn
          (coding-system-get 'cyrillic-koi8 :mime-charset)
               => koi8-r

     The value of the `:mime-charset' property is also defined as an
     alias for the coding system.

 -- Function: coding-system-aliases coding-system
     This function returns the list of aliases of CODING-SYSTEM.


File: elisp,  Node: Encoding and I/O,  Next: Lisp and Coding Systems,  Prev: Coding System Basics,  Up: Coding Systems

33.9.2 Encoding and I/O
-----------------------

The principal purpose of coding systems is for use in reading and
writing files.  The function `insert-file-contents' uses a coding
system to decode the file data, and `write-region' uses one to encode
the buffer contents.

   You can specify the coding system to use either explicitly (*note
Specifying Coding Systems::), or implicitly using a default mechanism
(*note Default Coding Systems::).  But these methods may not completely
specify what to do.  For example, they may choose a coding system such
as `undefined' which leaves the character code conversion to be
determined from the data.  In these cases, the I/O operation finishes
the job of choosing a coding system.  Very often you will want to find
out afterwards which coding system was chosen.

 -- Variable: buffer-file-coding-system
     This buffer-local variable records the coding system used for
     saving the buffer and for writing part of the buffer with
     `write-region'.  If the text to be written cannot be safely
     encoded using the coding system specified by this variable, these
     operations select an alternative encoding by calling the function
     `select-safe-coding-system' (*note User-Chosen Coding Systems::).
     If selecting a different encoding requires to ask the user to
     specify a coding system, `buffer-file-coding-system' is updated to
     the newly selected coding system.

     `buffer-file-coding-system' does _not_ affect sending text to a
     subprocess.

 -- Variable: save-buffer-coding-system
     This variable specifies the coding system for saving the buffer (by
     overriding `buffer-file-coding-system').  Note that it is not used
     for `write-region'.

     When a command to save the buffer starts out to use
     `buffer-file-coding-system' (or `save-buffer-coding-system'), and
     that coding system cannot handle the actual text in the buffer,
     the command asks the user to choose another coding system (by
     calling `select-safe-coding-system').  After that happens, the
     command also updates `buffer-file-coding-system' to represent the
     coding system that the user specified.

 -- Variable: last-coding-system-used
     I/O operations for files and subprocesses set this variable to the
     coding system name that was used.  The explicit encoding and
     decoding functions (*note Explicit Encoding::) set it too.

     *Warning:* Since receiving subprocess output sets this variable,
     it can change whenever Emacs waits; therefore, you should copy the
     value shortly after the function call that stores the value you are
     interested in.

   The variable `selection-coding-system' specifies how to encode
selections for the window system.  *Note Window System Selections::.

 -- Variable: file-name-coding-system
     The variable `file-name-coding-system' specifies the coding system
     to use for encoding file names.  Emacs encodes file names using
     that coding system for all file operations.  If
     `file-name-coding-system' is `nil', Emacs uses a default coding
     system determined by the selected language environment.  In the
     default language environment, any non-ASCII characters in file
     names are not encoded specially; they appear in the file system
     using the internal Emacs representation.

   *Warning:* if you change `file-name-coding-system' (or the language
environment) in the middle of an Emacs session, problems can result if
you have already visited files whose names were encoded using the
earlier coding system and are handled differently under the new coding
system.  If you try to save one of these buffers under the visited file
name, saving may use the wrong file name, or it may get an error.  If
such a problem happens, use `C-x C-w' to specify a new file name for
that buffer.


File: elisp,  Node: Lisp and Coding Systems,  Next: User-Chosen Coding Systems,  Prev: Encoding and I/O,  Up: Coding Systems

33.9.3 Coding Systems in Lisp
-----------------------------

Here are the Lisp facilities for working with coding systems:

 -- Function: coding-system-list &optional base-only
     This function returns a list of all coding system names (symbols).
     If BASE-ONLY is non-`nil', the value includes only the base coding
     systems.  Otherwise, it includes alias and variant coding systems
     as well.

 -- Function: coding-system-p object
     This function returns `t' if OBJECT is a coding system name or
     `nil'.

 -- Function: check-coding-system coding-system
     This function checks the validity of CODING-SYSTEM.  If that is
     valid, it returns CODING-SYSTEM.  If CODING-SYSTEM is `nil', the
     function return `nil'.  For any other values, it signals an error
     whose `error-symbol' is `coding-system-error' (*note signal:
     Signaling Errors.).

 -- Function: coding-system-eol-type coding-system
     This function returns the type of end-of-line (a.k.a. "eol")
     conversion used by CODING-SYSTEM.  If CODING-SYSTEM specifies a
     certain eol conversion, the return value is an integer 0, 1, or 2,
     standing for `unix', `dos', and `mac', respectively.  If
     CODING-SYSTEM doesn't specify eol conversion explicitly, the
     return value is a vector of coding systems, each one with one of
     the possible eol conversion types, like this:

          (coding-system-eol-type 'latin-1)
               => [latin-1-unix latin-1-dos latin-1-mac]

     If this function returns a vector, Emacs will decide, as part of
     the text encoding or decoding process, what eol conversion to use.
     For decoding, the end-of-line format of the text is auto-detected,
     and the eol conversion is set to match it (e.g., DOS-style CRLF
     format will imply `dos' eol conversion).  For encoding, the eol
     conversion is taken from the appropriate default coding system
     (e.g., default value of `buffer-file-coding-system' for
     `buffer-file-coding-system'), or from the default eol conversion
     appropriate for the underlying platform.

 -- Function: coding-system-change-eol-conversion coding-system eol-type
     This function returns a coding system which is like CODING-SYSTEM
     except for its eol conversion, which is specified by `eol-type'.
     EOL-TYPE should be `unix', `dos', `mac', or `nil'.  If it is
     `nil', the returned coding system determines the end-of-line
     conversion from the data.

     EOL-TYPE may also be 0, 1 or 2, standing for `unix', `dos' and
     `mac', respectively.

 -- Function: coding-system-change-text-conversion eol-coding
          text-coding
     This function returns a coding system which uses the end-of-line
     conversion of EOL-CODING, and the text conversion of TEXT-CODING.
     If TEXT-CODING is `nil', it returns `undecided', or one of its
     variants according to EOL-CODING.

 -- Function: find-coding-systems-region from to
     This function returns a list of coding systems that could be used
     to encode a text between FROM and TO.  All coding systems in the
     list can safely encode any multibyte characters in that portion of
     the text.

     If the text contains no multibyte characters, the function returns
     the list `(undecided)'.

 -- Function: find-coding-systems-string string
     This function returns a list of coding systems that could be used
     to encode the text of STRING.  All coding systems in the list can
     safely encode any multibyte characters in STRING.  If the text
     contains no multibyte characters, this returns the list
     `(undecided)'.

 -- Function: find-coding-systems-for-charsets charsets
     This function returns a list of coding systems that could be used
     to encode all the character sets in the list CHARSETS.

 -- Function: check-coding-systems-region start end coding-system-list
     This function checks whether coding systems in the list
     `coding-system-list' can encode all the characters in the region
     between START and END.  If all of the coding systems in the list
     can encode the specified text, the function returns `nil'.  If
     some coding systems cannot encode some of the characters, the
     value is an alist, each element of which has the form
     `(CODING-SYSTEM1 POS1 POS2 ...)', meaning that CODING-SYSTEM1
     cannot encode characters at buffer positions POS1, POS2, ....

     START may be a string, in which case END is ignored and the
     returned value references string indices instead of buffer
     positions.

 -- Function: detect-coding-region start end &optional highest
     This function chooses a plausible coding system for decoding the
     text from START to END.  This text should be a byte sequence, i.e.
     unibyte text or multibyte text with only ASCII and eight-bit
     characters (*note Explicit Encoding::).

     Normally this function returns a list of coding systems that could
     handle decoding the text that was scanned.  They are listed in
     order of decreasing priority.  But if HIGHEST is non-`nil', then
     the return value is just one coding system, the one that is
     highest in priority.

     If the region contains only ASCII characters except for such
     ISO-2022 control characters ISO-2022 as `ESC', the value is
     `undecided' or `(undecided)', or a variant specifying end-of-line
     conversion, if that can be deduced from the text.

     If the region contains null bytes, the value is `no-conversion',
     even if the region contains text encoded in some coding system.

 -- Function: detect-coding-string string &optional highest
     This function is like `detect-coding-region' except that it
     operates on the contents of STRING instead of bytes in the buffer.

 -- Variable: inhibit-null-byte-detection
     If this variable has a non-`nil' value, null bytes are ignored
     when detecting the encoding of a region or a string.  This allows
     to correctly detect the encoding of text that contains null bytes,
     such as Info files with Index nodes.

 -- Variable: inhibit-iso-escape-detection
     If this variable has a non-`nil' value, ISO-2022 escape sequences
     are ignored when detecting the encoding of a region or a string.
     The result is that no text is ever detected as encoded in some
     ISO-2022 encoding, and all escape sequences become visible in a
     buffer.  *Warning:* _Use this variable with extreme caution,
     because many files in the Emacs distribution use ISO-2022
     encoding._

 -- Function: coding-system-charset-list coding-system
     This function returns the list of character sets (*note Character
     Sets::) supported by CODING-SYSTEM.  Some coding systems that
     support too many character sets to list them all yield special
     values:
        * If CODING-SYSTEM supports all the ISO-2022 charsets, the value
          is `iso-2022'.

        * If CODING-SYSTEM supports all Emacs characters, the value is
          `(emacs)'.

        * If CODING-SYSTEM supports all emacs-mule characters, the value
          is `emacs-mule'.

        * If CODING-SYSTEM supports all Unicode characters, the value is
          `(unicode)'.

   *Note Process Information: Coding systems for a subprocess, in
particular the description of the functions `process-coding-system' and
`set-process-coding-system', for how to examine or set the coding
systems used for I/O to a subprocess.


File: elisp,  Node: User-Chosen Coding Systems,  Next: Default Coding Systems,  Prev: Lisp and Coding Systems,  Up: Coding Systems

33.9.4 User-Chosen Coding Systems
---------------------------------

 -- Function: select-safe-coding-system from to &optional
          default-coding-system accept-default-p file
     This function selects a coding system for encoding specified text,
     asking the user to choose if necessary.  Normally the specified
     text is the text in the current buffer between FROM and TO.  If
     FROM is a string, the string specifies the text to encode, and TO
     is ignored.

     If the specified text includes raw bytes (*note Text
     Representations::), `select-safe-coding-system' suggests
     `raw-text' for its encoding.

     If DEFAULT-CODING-SYSTEM is non-`nil', that is the first coding
     system to try; if that can handle the text,
     `select-safe-coding-system' returns that coding system.  It can
     also be a list of coding systems; then the function tries each of
     them one by one.  After trying all of them, it next tries the
     current buffer's value of `buffer-file-coding-system' (if it is not
     `undecided'), then the default value of
     `buffer-file-coding-system' and finally the user's most preferred
     coding system, which the user can set using the command
     `prefer-coding-system' (*note Recognizing Coding Systems:
     (emacs)Recognize Coding.).

     If one of those coding systems can safely encode all the specified
     text, `select-safe-coding-system' chooses it and returns it.
     Otherwise, it asks the user to choose from a list of coding systems
     which can encode all the text, and returns the user's choice.

     DEFAULT-CODING-SYSTEM can also be a list whose first element is t
     and whose other elements are coding systems.  Then, if no coding
     system in the list can handle the text, `select-safe-coding-system'
     queries the user immediately, without trying any of the three
     alternatives described above.

     The optional argument ACCEPT-DEFAULT-P, if non-`nil', should be a
     function to determine whether a coding system selected without
     user interaction is acceptable. `select-safe-coding-system' calls
     this function with one argument, the base coding system of the
     selected coding system.  If ACCEPT-DEFAULT-P returns `nil',
     `select-safe-coding-system' rejects the silently selected coding
     system, and asks the user to select a coding system from a list of
     possible candidates.

     If the variable `select-safe-coding-system-accept-default-p' is
     non-`nil', it should be a function taking a single argument.  It
     is used in place of ACCEPT-DEFAULT-P, overriding any value
     supplied for this argument.

     As a final step, before returning the chosen coding system,
     `select-safe-coding-system' checks whether that coding system is
     consistent with what would be selected if the contents of the
     region were read from a file.  (If not, this could lead to data
     corruption in a file subsequently re-visited and edited.)
     Normally, `select-safe-coding-system' uses `buffer-file-name' as
     the file for this purpose, but if FILE is non-`nil', it uses that
     file instead (this can be relevant for `write-region' and similar
     functions).  If it detects an apparent inconsistency,
     `select-safe-coding-system' queries the user before selecting the
     coding system.

   Here are two functions you can use to let the user specify a coding
system, with completion.  *Note Completion::.

 -- Function: read-coding-system prompt &optional default
     This function reads a coding system using the minibuffer,
     prompting with string PROMPT, and returns the coding system name
     as a symbol.  If the user enters null input, DEFAULT specifies
     which coding system to return.  It should be a symbol or a string.

 -- Function: read-non-nil-coding-system prompt
     This function reads a coding system using the minibuffer,
     prompting with string PROMPT, and returns the coding system name
     as a symbol.  If the user tries to enter null input, it asks the
     user to try again.  *Note Coding Systems::.


File: elisp,  Node: Default Coding Systems,  Next: Specifying Coding Systems,  Prev: User-Chosen Coding Systems,  Up: Coding Systems

33.9.5 Default Coding Systems
-----------------------------

This section describes variables that specify the default coding system
for certain files or when running certain subprograms, and the function
that I/O operations use to access them.

   The idea of these variables is that you set them once and for all to
the defaults you want, and then do not change them again.  To specify a
particular coding system for a particular operation in a Lisp program,
don't change these variables; instead, override them using
`coding-system-for-read' and `coding-system-for-write' (*note
Specifying Coding Systems::).

 -- User Option: auto-coding-regexp-alist
     This variable is an alist of text patterns and corresponding coding
     systems. Each element has the form `(REGEXP . CODING-SYSTEM)'; a
     file whose first few kilobytes match REGEXP is decoded with
     CODING-SYSTEM when its contents are read into a buffer.  The
     settings in this alist take priority over `coding:' tags in the
     files and the contents of `file-coding-system-alist' (see below).
     The default value is set so that Emacs automatically recognizes
     mail files in Babyl format and reads them with no code conversions.

 -- User Option: file-coding-system-alist
     This variable is an alist that specifies the coding systems to use
     for reading and writing particular files.  Each element has the
     form `(PATTERN . CODING)', where PATTERN is a regular expression
     that matches certain file names.  The element applies to file
     names that match PATTERN.

     The CDR of the element, CODING, should be either a coding system,
     a cons cell containing two coding systems, or a function name (a
     symbol with a function definition).  If CODING is a coding system,
     that coding system is used for both reading the file and writing
     it.  If CODING is a cons cell containing two coding systems, its
     CAR specifies the coding system for decoding, and its CDR
     specifies the coding system for encoding.

     If CODING is a function name, the function should take one
     argument, a list of all arguments passed to
     `find-operation-coding-system'.  It must return a coding system or
     a cons cell containing two coding systems.  This value has the same
     meaning as described above.

     If CODING (or what returned by the above function) is `undecided',
     the normal code-detection is performed.

 -- User Option: auto-coding-alist
     This variable is an alist that specifies the coding systems to use
     for reading and writing particular files.  Its form is like that of
     `file-coding-system-alist', but, unlike the latter, this variable
     takes priority over any `coding:' tags in the file.

 -- Variable: process-coding-system-alist
     This variable is an alist specifying which coding systems to use
     for a subprocess, depending on which program is running in the
     subprocess.  It works like `file-coding-system-alist', except that
     PATTERN is matched against the program name used to start the
     subprocess.  The coding system or systems specified in this alist
     are used to initialize the coding systems used for I/O to the
     subprocess, but you can specify other coding systems later using
     `set-process-coding-system'.

   *Warning:* Coding systems such as `undecided', which determine the
coding system from the data, do not work entirely reliably with
asynchronous subprocess output.  This is because Emacs handles
asynchronous subprocess output in batches, as it arrives.  If the coding
system leaves the character code conversion unspecified, or leaves the
end-of-line conversion unspecified, Emacs must try to detect the proper
conversion from one batch at a time, and this does not always work.

   Therefore, with an asynchronous subprocess, if at all possible, use a
coding system which determines both the character code conversion and
the end of line conversion--that is, one like `latin-1-unix', rather
than `undecided' or `latin-1'.

 -- Variable: network-coding-system-alist
     This variable is an alist that specifies the coding system to use
     for network streams.  It works much like
     `file-coding-system-alist', with the difference that the PATTERN
     in an element may be either a port number or a regular expression.
     If it is a regular expression, it is matched against the network
     service name used to open the network stream.

 -- Variable: default-process-coding-system
     This variable specifies the coding systems to use for subprocess
     (and network stream) input and output, when nothing else specifies
     what to do.

     The value should be a cons cell of the form `(INPUT-CODING .
     OUTPUT-CODING)'.  Here INPUT-CODING applies to input from the
     subprocess, and OUTPUT-CODING applies to output to it.

 -- User Option: auto-coding-functions
     This variable holds a list of functions that try to determine a
     coding system for a file based on its undecoded contents.

     Each function in this list should be written to look at text in the
     current buffer, but should not modify it in any way.  The buffer
     will contain undecoded text of parts of the file.  Each function
     should take one argument, SIZE, which tells it how many characters
     to look at, starting from point.  If the function succeeds in
     determining a coding system for the file, it should return that
     coding system.  Otherwise, it should return `nil'.

     If a file has a `coding:' tag, that takes precedence, so these
     functions won't be called.

 -- Function: find-auto-coding filename size
     This function tries to determine a suitable coding system for
     FILENAME.  It examines the buffer visiting the named file, using
     the variables documented above in sequence, until it finds a match
     for one of the rules specified by these variables.  It then
     returns a cons cell of the form `(CODING . SOURCE)', where CODING
     is the coding system to use and SOURCE is a symbol, one of
     `auto-coding-alist', `auto-coding-regexp-alist', `:coding', or
     `auto-coding-functions', indicating which one supplied the
     matching rule.  The value `:coding' means the coding system was
     specified by the `coding:' tag in the file (*note coding tag:
     (emacs)Specify Coding.).  The order of looking for a matching rule
     is `auto-coding-alist' first, then `auto-coding-regexp-alist',
     then the `coding:' tag, and lastly `auto-coding-functions'.  If no
     matching rule was found, the function returns `nil'.

     The second argument SIZE is the size of text, in characters,
     following point.  The function examines text only within SIZE
     characters after point.  Normally, the buffer should be positioned
     at the beginning when this function is called, because one of the
     places for the `coding:' tag is the first one or two lines of the
     file; in that case, SIZE should be the size of the buffer.

 -- Function: set-auto-coding filename size
     This function returns a suitable coding system for file FILENAME.
     It uses `find-auto-coding' to find the coding system.  If no
     coding system could be determined, the function returns `nil'.
     The meaning of the argument SIZE is like in `find-auto-coding'.

 -- Function: find-operation-coding-system operation &rest arguments
     This function returns the coding system to use (by default) for
     performing OPERATION with ARGUMENTS.  The value has this form:

          (DECODING-SYSTEM . ENCODING-SYSTEM)

     The first element, DECODING-SYSTEM, is the coding system to use
     for decoding (in case OPERATION does decoding), and
     ENCODING-SYSTEM is the coding system for encoding (in case
     OPERATION does encoding).

     The argument OPERATION is a symbol, one of `write-region',
     `start-process', `call-process', `call-process-region',
     `insert-file-contents', or `open-network-stream'.  These are the
     names of the Emacs I/O primitives that can do character code and
     eol conversion.

     The remaining arguments should be the same arguments that might be
     given to the corresponding I/O primitive.  Depending on the
     primitive, one of those arguments is selected as the "target".
     For example, if OPERATION does file I/O, whichever argument
     specifies the file name is the target.  For subprocess primitives,
     the process name is the target.  For `open-network-stream', the
     target is the service name or port number.

     Depending on OPERATION, this function looks up the target in
     `file-coding-system-alist', `process-coding-system-alist', or
     `network-coding-system-alist'.  If the target is found in the
     alist, `find-operation-coding-system' returns its association in
     the alist; otherwise it returns `nil'.

     If OPERATION is `insert-file-contents', the argument corresponding
     to the target may be a cons cell of the form `(FILENAME .
     BUFFER)').  In that case, FILENAME is a file name to look up in
     `file-coding-system-alist', and BUFFER is a buffer that contains
     the file's contents (not yet decoded).  If
     `file-coding-system-alist' specifies a function to call for this
     file, and that function needs to examine the file's contents (as
     it usually does), it should examine the contents of BUFFER instead
     of reading the file.


File: elisp,  Node: Specifying Coding Systems,  Next: Explicit Encoding,  Prev: Default Coding Systems,  Up: Coding Systems

33.9.6 Specifying a Coding System for One Operation
---------------------------------------------------

You can specify the coding system for a specific operation by binding
the variables `coding-system-for-read' and/or `coding-system-for-write'.

 -- Variable: coding-system-for-read
     If this variable is non-`nil', it specifies the coding system to
     use for reading a file, or for input from a synchronous subprocess.

     It also applies to any asynchronous subprocess or network stream,
     but in a different way: the value of `coding-system-for-read' when
     you start the subprocess or open the network stream specifies the
     input decoding method for that subprocess or network stream.  It
     remains in use for that subprocess or network stream unless and
     until overridden.

     The right way to use this variable is to bind it with `let' for a
     specific I/O operation.  Its global value is normally `nil', and
     you should not globally set it to any other value.  Here is an
     example of the right way to use the variable:

          ;; Read the file with no character code conversion.
          ;; Assume crlf represents end-of-line.
          (let ((coding-system-for-read 'emacs-mule-dos))
            (insert-file-contents filename))

     When its value is non-`nil', this variable takes precedence over
     all other methods of specifying a coding system to use for input,
     including `file-coding-system-alist',
     `process-coding-system-alist' and `network-coding-system-alist'.

 -- Variable: coding-system-for-write
     This works much like `coding-system-for-read', except that it
     applies to output rather than input.  It affects writing to files,
     as well as sending output to subprocesses and net connections.

     When a single operation does both input and output, as do
     `call-process-region' and `start-process', both
     `coding-system-for-read' and `coding-system-for-write' affect it.

 -- User Option: inhibit-eol-conversion
     When this variable is non-`nil', no end-of-line conversion is done,
     no matter which coding system is specified.  This applies to all
     the Emacs I/O and subprocess primitives, and to the explicit
     encoding and decoding functions (*note Explicit Encoding::).

   Sometimes, you need to prefer several coding systems for some
operation, rather than fix a single one.  Emacs lets you specify a
priority order for using coding systems.  This ordering affects the
sorting of lists of coding sysems returned by functions such as
`find-coding-systems-region' (*note Lisp and Coding Systems::).

 -- Function: coding-system-priority-list &optional highestp
     This function returns the list of coding systems in the order of
     their current priorities.  Optional argument HIGHESTP, if
     non-`nil', means return only the highest priority coding system.

 -- Function: set-coding-system-priority &rest coding-systems
     This function puts CODING-SYSTEMS at the beginning of the priority
     list for coding systems, thus making their priority higher than
     all the rest.

 -- Macro: with-coding-priority coding-systems &rest body...
     This macro execute BODY, like `progn' does (*note progn:
     Sequencing.), with CODING-SYSTEMS at the front of the priority
     list for coding systems.  CODING-SYSTEMS should be a list of
     coding systems to prefer during execution of BODY.


File: elisp,  Node: Explicit Encoding,  Next: Terminal I/O Encoding,  Prev: Specifying Coding Systems,  Up: Coding Systems

33.9.7 Explicit Encoding and Decoding
-------------------------------------

All the operations that transfer text in and out of Emacs have the
ability to use a coding system to encode or decode the text.  You can
also explicitly encode and decode text using the functions in this
section.

   The result of encoding, and the input to decoding, are not ordinary
text.  They logically consist of a series of byte values; that is, a
series of ASCII and eight-bit characters.  In unibyte buffers and
strings, these characters have codes in the range 0 through #xFF (255).
In a multibyte buffer or string, eight-bit characters have character
codes higher than #xFF (*note Text Representations::), but Emacs
transparently converts them to their single-byte values when you encode
or decode such text.

   The usual way to read a file into a buffer as a sequence of bytes, so
you can decode the contents explicitly, is with
`insert-file-contents-literally' (*note Reading from Files::);
alternatively, specify a non-`nil' RAWFILE argument when visiting a
file with `find-file-noselect'.  These methods result in a unibyte
buffer.

   The usual way to use the byte sequence that results from explicitly
encoding text is to copy it to a file or process--for example, to write
it with `write-region' (*note Writing to Files::), and suppress
encoding by binding `coding-system-for-write' to `no-conversion'.

   Here are the functions to perform explicit encoding or decoding.  The
encoding functions produce sequences of bytes; the decoding functions
are meant to operate on sequences of bytes.  All of these functions
discard text properties.  They also set `last-coding-system-used' to
the precise coding system they used.

 -- Command: encode-coding-region start end coding-system &optional
          destination
     This command encodes the text from START to END according to
     coding system CODING-SYSTEM.  Normally, the encoded text replaces
     the original text in the buffer, but the optional argument
     DESTINATION can change that.  If DESTINATION is a buffer, the
     encoded text is inserted in that buffer after point (point does
     not move); if it is `t', the command returns the encoded text as a
     unibyte string without inserting it.

     If encoded text is inserted in some buffer, this command returns
     the length of the encoded text.

     The result of encoding is logically a sequence of bytes, but the
     buffer remains multibyte if it was multibyte before, and any 8-bit
     bytes are converted to their multibyte representation (*note Text
     Representations::).

     Do _not_ use `undecided' for CODING-SYSTEM when encoding text,
     since that may lead to unexpected results.  Instead, use
     `select-safe-coding-system' (*note select-safe-coding-system:
     User-Chosen Coding Systems.) to suggest a suitable encoding, if
     there's no obvious pertinent value for CODING-SYSTEM.

 -- Function: encode-coding-string string coding-system &optional
          nocopy buffer
     This function encodes the text in STRING according to coding
     system CODING-SYSTEM.  It returns a new string containing the
     encoded text, except when NOCOPY is non-`nil', in which case the
     function may return STRING itself if the encoding operation is
     trivial.  The result of encoding is a unibyte string.

 -- Command: decode-coding-region start end coding-system &optional
          destination
     This command decodes the text from START to END according to
     coding system CODING-SYSTEM.  To make explicit decoding useful,
     the text before decoding ought to be a sequence of byte values,
     but both multibyte and unibyte buffers are acceptable (in the
     multibyte case, the raw byte values should be represented as
     eight-bit characters).  Normally, the decoded text replaces the
     original text in the buffer, but the optional argument DESTINATION
     can change that.  If DESTINATION is a buffer, the decoded text is
     inserted in that buffer after point (point does not move); if it
     is `t', the command returns the decoded text as a multibyte string
     without inserting it.

     If decoded text is inserted in some buffer, this command returns
     the length of the decoded text.

     This command puts a `charset' text property on the decoded text.
     The value of the property states the character set used to decode
     the original text.

 -- Function: decode-coding-string string coding-system &optional
          nocopy buffer
     This function decodes the text in STRING according to
     CODING-SYSTEM.  It returns a new string containing the decoded
     text, except when NOCOPY is non-`nil', in which case the function
     may return STRING itself if the decoding operation is trivial.  To
     make explicit decoding useful, the contents of STRING ought to be
     a unibyte string with a sequence of byte values, but a multibyte
     string is also acceptable (assuming it contains 8-bit bytes in
     their multibyte form).

     If optional argument BUFFER specifies a buffer, the decoded text
     is inserted in that buffer after point (point does not move).  In
     this case, the return value is the length of the decoded text.

     This function puts a `charset' text property on the decoded text.
     The value of the property states the character set used to decode
     the original text:

          (decode-coding-string "Gr\374ss Gott" 'latin-1)
               => #("Gru"ss Gott" 0 9 (charset iso-8859-1))

 -- Function: decode-coding-inserted-region from to filename &optional
          visit beg end replace
     This function decodes the text from FROM to TO as if it were being
     read from file FILENAME using `insert-file-contents' using the
     rest of the arguments provided.

     The normal way to use this function is after reading text from a
     file without decoding, if you decide you would rather have decoded
     it.  Instead of deleting the text and reading it again, this time
     with decoding, you can call this function.


File: elisp,  Node: Terminal I/O Encoding,  Next: MS-DOS File Types,  Prev: Explicit Encoding,  Up: Coding Systems

33.9.8 Terminal I/O Encoding
----------------------------

Emacs can decode keyboard input using a coding system, and encode
terminal output.  This is useful for terminals that transmit or display
text using a particular encoding such as Latin-1.  Emacs does not set
`last-coding-system-used' for encoding or decoding of terminal I/O.

 -- Function: keyboard-coding-system &optional terminal
     This function returns the coding system that is in use for decoding
     keyboard input from TERMINAL--or `nil' if no coding system is to
     be used for that terminal.  If TERMINAL is omitted or `nil', it
     means the selected frame's terminal.  *Note Multiple Terminals::.

 -- Command: set-keyboard-coding-system coding-system &optional terminal
     This command specifies CODING-SYSTEM as the coding system to use
     for decoding keyboard input from TERMINAL.  If CODING-SYSTEM is
     `nil', that means do not decode keyboard input.  If TERMINAL is a
     frame, it means that frame's terminal; if it is `nil', that means
     the currently selected frame's terminal.  *Note Multiple
     Terminals::.

 -- Function: terminal-coding-system &optional terminal
     This function returns the coding system that is in use for encoding
     terminal output from TERMINAL--or `nil' if the output is not
     encoded.  If TERMINAL is a frame, it means that frame's terminal;
     if it is `nil', that means the currently selected frame's terminal.

 -- Command: set-terminal-coding-system coding-system &optional terminal
     This command specifies CODING-SYSTEM as the coding system to use
     for encoding terminal output from TERMINAL.  If CODING-SYSTEM is
     `nil', terminal output is not encoded.  If TERMINAL is a frame, it
     means that frame's terminal; if it is `nil', that means the
     currently selected frame's terminal.


File: elisp,  Node: MS-DOS File Types,  Prev: Terminal I/O Encoding,  Up: Coding Systems

33.9.9 MS-DOS File Types
------------------------

On MS-DOS and Microsoft Windows, Emacs guesses the appropriate
end-of-line conversion for a file by looking at the file's name.  This
feature classifies files as "text files" and "binary files".  By
"binary file" we mean a file of literal byte values that are not
necessarily meant to be characters; Emacs does no end-of-line conversion
and no character code conversion for them.  On the other hand, the bytes
in a text file are intended to represent characters; when you create a
new file whose name implies that it is a text file, Emacs uses DOS
end-of-line conversion.

 -- Variable: buffer-file-type
     This variable, automatically buffer-local in each buffer, records
     the file type of the buffer's visited file.  When a buffer does
     not specify a coding system with `buffer-file-coding-system', this
     variable is used to determine which coding system to use when
     writing the contents of the buffer.  It should be `nil' for text,
     `t' for binary.  If it is `t', the coding system is
     `no-conversion'.  Otherwise, `undecided-dos' is used.

     Normally this variable is set by visiting a file; it is set to
     `nil' if the file was visited without any actual conversion.

     Its default value is used to decide how to handle files for which
     `file-name-buffer-file-type-alist' says nothing about the type: If
     the default value is non-`nil', then these files are treated as
     binary: the coding system `no-conversion' is used.  Otherwise,
     nothing special is done for them--the coding system is deduced
     solely from the file contents, in the usual Emacs fashion.

 -- User Option: file-name-buffer-file-type-alist
     This variable holds an alist for recognizing text and binary files.
     Each element has the form (REGEXP . TYPE), where REGEXP is matched
     against the file name, and TYPE may be `nil' for text, `t' for
     binary, or a function to call to compute which.  If it is a
     function, then it is called with a single argument (the file name)
     and should return `t' or `nil'.

     When running on MS-DOS or MS-Windows, Emacs checks this alist to
     decide which coding system to use when reading a file.  For a text
     file, `undecided-dos' is used.  For a binary file, `no-conversion'
     is used.

     If no element in this alist matches a given file name, then the
     default value of `buffer-file-type' says how to treat the file.


File: elisp,  Node: Input Methods,  Next: Locales,  Prev: Coding Systems,  Up: Non-ASCII Characters

33.10 Input Methods
===================

"Input methods" provide convenient ways of entering non-ASCII
characters from the keyboard.  Unlike coding systems, which translate
non-ASCII characters to and from encodings meant to be read by
programs, input methods provide human-friendly commands.  (*Note Input
Methods: (emacs)Input Methods, for information on how users use input
methods to enter text.)  How to define input methods is not yet
documented in this manual, but here we describe how to use them.

   Each input method has a name, which is currently a string; in the
future, symbols may also be usable as input method names.

 -- Variable: current-input-method
     This variable holds the name of the input method now active in the
     current buffer.  (It automatically becomes local in each buffer
     when set in any fashion.)  It is `nil' if no input method is
     active in the buffer now.

 -- User Option: default-input-method
     This variable holds the default input method for commands that
     choose an input method.  Unlike `current-input-method', this
     variable is normally global.

 -- Command: set-input-method input-method
     This command activates input method INPUT-METHOD for the current
     buffer.  It also sets `default-input-method' to INPUT-METHOD.  If
     INPUT-METHOD is `nil', this command deactivates any input method
     for the current buffer.

 -- Function: read-input-method-name prompt &optional default
          inhibit-null
     This function reads an input method name with the minibuffer,
     prompting with PROMPT.  If DEFAULT is non-`nil', that is returned
     by default, if the user enters empty input.  However, if
     INHIBIT-NULL is non-`nil', empty input signals an error.

     The returned value is a string.

 -- Variable: input-method-alist
     This variable defines all the supported input methods.  Each
     element defines one input method, and should have the form:

          (INPUT-METHOD LANGUAGE-ENV ACTIVATE-FUNC
           TITLE DESCRIPTION ARGS...)

     Here INPUT-METHOD is the input method name, a string; LANGUAGE-ENV
     is another string, the name of the language environment this input
     method is recommended for.  (That serves only for documentation
     purposes.)

     ACTIVATE-FUNC is a function to call to activate this method.  The
     ARGS, if any, are passed as arguments to ACTIVATE-FUNC.  All told,
     the arguments to ACTIVATE-FUNC are INPUT-METHOD and the ARGS.

     TITLE is a string to display in the mode line while this method is
     active.  DESCRIPTION is a string describing this method and what
     it is good for.

   The fundamental interface to input methods is through the variable
`input-method-function'.  *Note Reading One Event::, and *note Invoking
the Input Method::.


File: elisp,  Node: Locales,  Prev: Input Methods,  Up: Non-ASCII Characters

33.11 Locales
=============

POSIX defines a concept of "locales" which control which language to
use in language-related features.  These Emacs variables control how
Emacs interacts with these features.

 -- Variable: locale-coding-system
     This variable specifies the coding system to use for decoding
     system error messages and--on X Window system only--keyboard
     input, for encoding the format argument to `format-time-string',
     and for decoding the return value of `format-time-string'.

 -- Variable: system-messages-locale
     This variable specifies the locale to use for generating system
     error messages.  Changing the locale can cause messages to come
     out in a different language or in a different orthography.  If the
     variable is `nil', the locale is specified by environment
     variables in the usual POSIX fashion.

 -- Variable: system-time-locale
     This variable specifies the locale to use for formatting time
     values.  Changing the locale can cause messages to appear
     according to the conventions of a different language.  If the
     variable is `nil', the locale is specified by environment
     variables in the usual POSIX fashion.

 -- Function: locale-info item
     This function returns locale data ITEM for the current POSIX
     locale, if available.  ITEM should be one of these symbols:

    `codeset'
          Return the character set as a string (locale item `CODESET').

    `days'
          Return a 7-element vector of day names (locale items `DAY_1'
          through `DAY_7');

    `months'
          Return a 12-element vector of month names (locale items
          `MON_1' through `MON_12').

    `paper'
          Return a list `(WIDTH HEIGHT)' for the default paper size
          measured in millimeters (locale items `PAPER_WIDTH' and
          `PAPER_HEIGHT').

     If the system can't provide the requested information, or if ITEM
     is not one of those symbols, the value is `nil'.  All strings in
     the return value are decoded using `locale-coding-system'.  *Note
     Locales: (libc)Locales, for more information about locales and
     locale items.


File: elisp,  Node: Searching and Matching,  Next: Syntax Tables,  Prev: Non-ASCII Characters,  Up: Top

34 Searching and Matching
*************************

GNU Emacs provides two ways to search through a buffer for specified
text: exact string searches and regular expression searches.  After a
regular expression search, you can examine the "match data" to
determine which text matched the whole regular expression or various
portions of it.

* Menu:

* String Search::         Search for an exact match.
* Searching and Case::    Case-independent or case-significant searching.
* Regular Expressions::   Describing classes of strings.
* Regexp Search::         Searching for a match for a regexp.
* POSIX Regexps::         Searching POSIX-style for the longest match.
* Match Data::            Finding out which part of the text matched,
                            after a string or regexp search.
* Search and Replace::	  Commands that loop, searching and replacing.
* Standard Regexps::      Useful regexps for finding sentences, pages,...

   The `skip-chars...' functions also perform a kind of searching.
*Note Skipping Characters::.  To search for changes in character
properties, see *note Property Search::.


File: elisp,  Node: String Search,  Next: Searching and Case,  Up: Searching and Matching

34.1 Searching for Strings
==========================

These are the primitive functions for searching through the text in a
buffer.  They are meant for use in programs, but you may call them
interactively.  If you do so, they prompt for the search string; the
arguments LIMIT and NOERROR are `nil', and REPEAT is 1.

   These search functions convert the search string to multibyte if the
buffer is multibyte; they convert the search string to unibyte if the
buffer is unibyte.  *Note Text Representations::.

 -- Command: search-forward string &optional limit noerror repeat
     This function searches forward from point for an exact match for
     STRING.  If successful, it sets point to the end of the occurrence
     found, and returns the new value of point.  If no match is found,
     the value and side effects depend on NOERROR (see below).

     In the following example, point is initially at the beginning of
     the line.  Then `(search-forward "fox")' moves point after the last
     letter of `fox':

          ---------- Buffer: foo ----------
          -!-The quick brown fox jumped over the lazy dog.
          ---------- Buffer: foo ----------

          (search-forward "fox")
               => 20

          ---------- Buffer: foo ----------
          The quick brown fox-!- jumped over the lazy dog.
          ---------- Buffer: foo ----------

     The argument LIMIT specifies the upper bound to the search.  (It
     must be a position in the current buffer.)  No match extending
     after that position is accepted.  If LIMIT is omitted or `nil', it
     defaults to the end of the accessible portion of the buffer.

     What happens when the search fails depends on the value of
     NOERROR.  If NOERROR is `nil', a `search-failed' error is
     signaled.  If NOERROR is `t', `search-forward' returns `nil' and
     does nothing.  If NOERROR is neither `nil' nor `t', then
     `search-forward' moves point to the upper bound and returns `nil'.
     (It would be more consistent now to return the new position of
     point in that case, but some existing programs may depend on a
     value of `nil'.)

     The argument NOERROR only affects valid searches which fail to
     find a match.  Invalid arguments cause errors regardless of
     NOERROR.

     If REPEAT is supplied (it must be a positive number), then the
     search is repeated that many times (each time starting at the end
     of the previous time's match).  If these successive searches
     succeed, the function succeeds, moving point and returning its new
     value.  Otherwise the search fails, with results depending on the
     value of NOERROR, as described above.

 -- Command: search-backward string &optional limit noerror repeat
     This function searches backward from point for STRING.  It is just
     like `search-forward' except that it searches backwards and leaves
     point at the beginning of the match.

 -- Command: word-search-forward string &optional limit noerror repeat
     This function searches forward from point for a "word" match for
     STRING.  If it finds a match, it sets point to the end of the
     match found, and returns the new value of point.

     Word matching regards STRING as a sequence of words, disregarding
     punctuation that separates them.  It searches the buffer for the
     same sequence of words.  Each word must be distinct in the buffer
     (searching for the word `ball' does not match the word `balls'),
     but the details of punctuation and spacing are ignored (searching
     for `ball boy' does match `ball.  Boy!').

     In this example, point is initially at the beginning of the
     buffer; the search leaves it between the `y' and the `!'.

          ---------- Buffer: foo ----------
          -!-He said "Please!  Find
          the ball boy!"
          ---------- Buffer: foo ----------

          (word-search-forward "Please find the ball, boy.")
               => 35

          ---------- Buffer: foo ----------
          He said "Please!  Find
          the ball boy-!-!"
          ---------- Buffer: foo ----------

     If LIMIT is non-`nil', it must be a position in the current
     buffer; it specifies the upper bound to the search.  The match
     found must not extend after that position.

     If NOERROR is `nil', then `word-search-forward' signals an error
     if the search fails.  If NOERROR is `t', then it returns `nil'
     instead of signaling an error.  If NOERROR is neither `nil' nor
     `t', it moves point to LIMIT (or the end of the accessible portion
     of the buffer) and returns `nil'.

     If REPEAT is non-`nil', then the search is repeated that many
     times.  Point is positioned at the end of the last match.

 -- Command: word-search-forward-lax string &optional limit noerror
          repeat
     This command is identical to `word-search-forward', except that
     the end of `string' need not match a word boundary unless it ends
     in whitespace.  For instance, searching for `ball boy' matches
     `ball boyee', but does not match `aball boy'.

 -- Command: word-search-backward string &optional limit noerror repeat
     This function searches backward from point for a word match to
     STRING.  This function is just like `word-search-forward' except
     that it searches backward and normally leaves point at the
     beginning of the match.

 -- Command: word-search-backward-lax string &optional limit noerror
          repeat
     This command is identical to `word-search-backward', except that
     the end of `string' need not match a word boundary unless it ends
     in whitespace.


File: elisp,  Node: Searching and Case,  Next: Regular Expressions,  Prev: String Search,  Up: Searching and Matching

34.2 Searching and Case
=======================

By default, searches in Emacs ignore the case of the text they are
searching through; if you specify searching for `FOO', then `Foo' or
`foo' is also considered a match.  This applies to regular expressions,
too; thus, `[aB]' would match `a' or `A' or `b' or `B'.

   If you do not want this feature, set the variable `case-fold-search'
to `nil'.  Then all letters must match exactly, including case.  This
is a buffer-local variable; altering the variable affects only the
current buffer.  (*Note Intro to Buffer-Local::.)  Alternatively, you
may change the default value of `case-fold-search'.

   Note that the user-level incremental search feature handles case
distinctions differently.  When the search string contains only lower
case letters, the search ignores case, but when the search string
contains one or more upper case letters, the search becomes
case-sensitive.  But this has nothing to do with the searching
functions used in Lisp code.

 -- User Option: case-fold-search
     This buffer-local variable determines whether searches should
     ignore case.  If the variable is `nil' they do not ignore case;
     otherwise they do ignore case.

 -- User Option: case-replace
     This variable determines whether the higher level replacement
     functions should preserve case.  If the variable is `nil', that
     means to use the replacement text verbatim.  A non-`nil' value
     means to convert the case of the replacement text according to the
     text being replaced.

     This variable is used by passing it as an argument to the function
     `replace-match'.  *Note Replacing Match::.


File: elisp,  Node: Regular Expressions,  Next: Regexp Search,  Prev: Searching and Case,  Up: Searching and Matching

34.3 Regular Expressions
========================

A "regular expression", or "regexp" for short, is a pattern that
denotes a (possibly infinite) set of strings.  Searching for matches for
a regexp is a very powerful operation.  This section explains how to
write regexps; the following section says how to search for them.

   For convenient interactive development of regular expressions, you
can use the `M-x re-builder' command.  It provides a convenient
interface for creating regular expressions, by giving immediate visual
feedback in a separate buffer.  As you edit the regexp, all its matches
in the target buffer are highlighted.  Each parenthesized
sub-expression of the regexp is shown in a distinct face, which makes
it easier to verify even very complex regexps.

* Menu:

* Syntax of Regexps::       Rules for writing regular expressions.
* Regexp Example::          Illustrates regular expression syntax.
* Regexp Functions::        Functions for operating on regular expressions.


File: elisp,  Node: Syntax of Regexps,  Next: Regexp Example,  Up: Regular Expressions

34.3.1 Syntax of Regular Expressions
------------------------------------

Regular expressions have a syntax in which a few characters are special
constructs and the rest are "ordinary".  An ordinary character is a
simple regular expression that matches that character and nothing else.
The special characters are `.', `*', `+', `?', `[', `^', `$', and `\';
no new special characters will be defined in the future.  The character
`]' is special if it ends a character alternative (see later).  The
character `-' is special inside a character alternative.  A `[:' and
balancing `:]' enclose a character class inside a character
alternative.  Any other character appearing in a regular expression is
ordinary, unless a `\' precedes it.

   For example, `f' is not a special character, so it is ordinary, and
therefore `f' is a regular expression that matches the string `f' and
no other string.  (It does _not_ match the string `fg', but it does
match a _part_ of that string.)  Likewise, `o' is a regular expression
that matches only `o'.

   Any two regular expressions A and B can be concatenated.  The result
is a regular expression that matches a string if A matches some amount
of the beginning of that string and B matches the rest of the string.

   As a simple example, we can concatenate the regular expressions `f'
and `o' to get the regular expression `fo', which matches only the
string `fo'.  Still trivial.  To do something more powerful, you need
to use one of the special regular expression constructs.

* Menu:

* Regexp Special::      Special characters in regular expressions.
* Char Classes::        Character classes used in regular expressions.
* Regexp Backslash::    Backslash-sequences in regular expressions.


File: elisp,  Node: Regexp Special,  Next: Char Classes,  Up: Syntax of Regexps

34.3.1.1 Special Characters in Regular Expressions
..................................................

Here is a list of the characters that are special in a regular
expression.

`.' (Period)
     is a special character that matches any single character except a
     newline.  Using concatenation, we can make regular expressions
     like `a.b', which matches any three-character string that begins
     with `a' and ends with `b'.

`*'
     is not a construct by itself; it is a postfix operator that means
     to match the preceding regular expression repetitively as many
     times as possible.  Thus, `o*' matches any number of `o's
     (including no `o's).

     `*' always applies to the _smallest_ possible preceding
     expression.  Thus, `fo*' has a repeating `o', not a repeating
     `fo'.  It matches `f', `fo', `foo', and so on.

     The matcher processes a `*' construct by matching, immediately, as
     many repetitions as can be found.  Then it continues with the rest
     of the pattern.  If that fails, backtracking occurs, discarding
     some of the matches of the `*'-modified construct in the hope that
     that will make it possible to match the rest of the pattern.  For
     example, in matching `ca*ar' against the string `caaar', the `a*'
     first tries to match all three `a's; but the rest of the pattern is
     `ar' and there is only `r' left to match, so this try fails.  The
     next alternative is for `a*' to match only two `a's.  With this
     choice, the rest of the regexp matches successfully.

     *Warning:* Nested repetition operators can run for an indefinitely
     long time, if they lead to ambiguous matching.  For example,
     trying to match the regular expression `\(x+y*\)*a' against the
     string `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz' could take hours
     before it ultimately fails.  Emacs must try each way of grouping
     the `x's before concluding that none of them can work.  Even
     worse, `\(x*\)*' can match the null string in infinitely many
     ways, so it causes an infinite loop.  To avoid these problems,
     check nested repetitions carefully, to make sure that they do not
     cause combinatorial explosions in backtracking.

`+'
     is a postfix operator, similar to `*' except that it must match
     the preceding expression at least once.  So, for example, `ca+r'
     matches the strings `car' and `caaaar' but not the string `cr',
     whereas `ca*r' matches all three strings.

`?'
     is a postfix operator, similar to `*' except that it must match the
     preceding expression either once or not at all.  For example,
     `ca?r' matches `car' or `cr'; nothing else.

`*?', `+?', `??'
     These are "non-greedy" variants of the operators `*', `+' and `?'.
     Where those operators match the largest possible substring
     (consistent with matching the entire containing expression), the
     non-greedy variants match the smallest possible substring
     (consistent with matching the entire containing expression).

     For example, the regular expression `c[ad]*a' when applied to the
     string `cdaaada' matches the whole string; but the regular
     expression `c[ad]*?a', applied to that same string, matches just
     `cda'.  (The smallest possible match here for `[ad]*?' that
     permits the whole expression to match is `d'.)

`[ ... ]'
     is a "character alternative", which begins with `[' and is
     terminated by `]'.  In the simplest case, the characters between
     the two brackets are what this character alternative can match.

     Thus, `[ad]' matches either one `a' or one `d', and `[ad]*'
     matches any string composed of just `a's and `d's (including the
     empty string), from which it follows that `c[ad]*r' matches `cr',
     `car', `cdr', `caddaar', etc.

     You can also include character ranges in a character alternative,
     by writing the starting and ending characters with a `-' between
     them.  Thus, `[a-z]' matches any lower-case ASCII letter.  Ranges
     may be intermixed freely with individual characters, as in
     `[a-z$%.]', which matches any lower case ASCII letter or `$', `%'
     or period.

     Note that the usual regexp special characters are not special
     inside a character alternative.  A completely different set of
     characters is special inside character alternatives: `]', `-' and
     `^'.

     To include a `]' in a character alternative, you must make it the
     first character.  For example, `[]a]' matches `]' or `a'.  To
     include a `-', write `-' as the first or last character of the
     character alternative, or put it after a range.  Thus, `[]-]'
     matches both `]' and `-'.

     To include `^' in a character alternative, put it anywhere but at
     the beginning.

     The beginning and end of a range of multibyte characters must be in
     the same character set (*note Character Sets::).  Thus,
     `"[\x8e0-\x97c]"' is invalid because character 0x8e0 (`a' with
     grave accent) is in the Emacs character set for Latin-1 but the
     character 0x97c (`u' with diaeresis) is in the Emacs character set
     for Latin-2.  (We use Lisp string syntax to write that example,
     and a few others in the next few paragraphs, in order to include
     hex escape sequences in them.)

     If a range starts with a unibyte character C and ends with a
     multibyte character C2, the range is divided into two parts: one
     is `C..?\377', the other is `C1..C2', where C1 is the first
     character of the charset to which C2 belongs.

     You cannot always match all non-ASCII characters with the regular
     expression `"[\200-\377]"'.  This works when searching a unibyte
     buffer or string (*note Text Representations::), but not in a
     multibyte buffer or string, because many non-ASCII characters have
     codes above octal 0377.  However, the regular expression
     `"[^\000-\177]"' does match all non-ASCII characters (see below
     regarding `^'), in both multibyte and unibyte representations,
     because only the ASCII characters are excluded.

     A character alternative can also specify named character classes
     (*note Char Classes::).  This is a POSIX feature whose syntax is
     `[:CLASS:]'.  Using a character class is equivalent to mentioning
     each of the characters in that class; but the latter is not
     feasible in practice, since some classes include thousands of
     different characters.

`[^ ... ]'
     `[^' begins a "complemented character alternative".  This matches
     any character except the ones specified.  Thus, `[^a-z0-9A-Z]'
     matches all characters _except_ letters and digits.

     `^' is not special in a character alternative unless it is the
     first character.  The character following the `^' is treated as if
     it were first (in other words, `-' and `]' are not special there).

     A complemented character alternative can match a newline, unless
     newline is mentioned as one of the characters not to match.  This
     is in contrast to the handling of regexps in programs such as
     `grep'.

`^'
     When matching a buffer, `^' matches the empty string, but only at
     the beginning of a line in the text being matched (or the
     beginning of the accessible portion of the buffer).  Otherwise it
     fails to match anything.  Thus, `^foo' matches a `foo' that occurs
     at the beginning of a line.

     When matching a string instead of a buffer, `^' matches at the
     beginning of the string or after a newline character.

     For historical compatibility reasons, `^' can be used only at the
     beginning of the regular expression, or after `\(', `\(?:' or `\|'.

`$'
     is similar to `^' but matches only at the end of a line (or the
     end of the accessible portion of the buffer).  Thus, `x+$' matches
     a string of one `x' or more at the end of a line.

     When matching a string instead of a buffer, `$' matches at the end
     of the string or before a newline character.

     For historical compatibility reasons, `$' can be used only at the
     end of the regular expression, or before `\)' or `\|'.

`\'
     has two functions: it quotes the special characters (including
     `\'), and it introduces additional special constructs.

     Because `\' quotes special characters, `\$' is a regular
     expression that matches only `$', and `\[' is a regular expression
     that matches only `[', and so on.

     Note that `\' also has special meaning in the read syntax of Lisp
     strings (*note String Type::), and must be quoted with `\'.  For
     example, the regular expression that matches the `\' character is
     `\\'.  To write a Lisp string that contains the characters `\\',
     Lisp syntax requires you to quote each `\' with another `\'.
     Therefore, the read syntax for a regular expression matching `\'
     is `"\\\\"'.

   *Please note:* For historical compatibility, special characters are
treated as ordinary ones if they are in contexts where their special
meanings make no sense.  For example, `*foo' treats `*' as ordinary
since there is no preceding expression on which the `*' can act.  It is
poor practice to depend on this behavior; quote the special character
anyway, regardless of where it appears.

   As a `\' is not special inside a character alternative, it can never
remove the special meaning of `-' or `]'.  So you should not quote
these characters when they have no special meaning either.  This would
not clarify anything, since backslashes can legitimately precede these
characters where they _have_ special meaning, as in `[^\]' (`"[^\\]"'
for Lisp string syntax), which matches any single character except a
backslash.

   In practice, most `]' that occur in regular expressions close a
character alternative and hence are special.  However, occasionally a
regular expression may try to match a complex pattern of literal `['
and `]'.  In such situations, it sometimes may be necessary to
carefully parse the regexp from the start to determine which square
brackets enclose a character alternative.  For example, `[^][]]'
consists of the complemented character alternative `[^][]' (which
matches any single character that is not a square bracket), followed by
a literal `]'.

   The exact rules are that at the beginning of a regexp, `[' is
special and `]' not.  This lasts until the first unquoted `[', after
which we are in a character alternative; `[' is no longer special
(except when it starts a character class) but `]' is special, unless it
immediately follows the special `[' or that `[' followed by a `^'.
This lasts until the next special `]' that does not end a character
class.  This ends the character alternative and restores the ordinary
syntax of regular expressions; an unquoted `[' is special again and a
`]' not.


File: elisp,  Node: Char Classes,  Next: Regexp Backslash,  Prev: Regexp Special,  Up: Syntax of Regexps

34.3.1.2 Character Classes
..........................

Here is a table of the classes you can use in a character alternative,
and what they mean:

`[:ascii:]'
     This matches any ASCII character (codes 0-127).

`[:alnum:]'
     This matches any letter or digit.  (At present, for multibyte
     characters, it matches anything that has word syntax.)

`[:alpha:]'
     This matches any letter.  (At present, for multibyte characters, it
     matches anything that has word syntax.)

`[:blank:]'
     This matches space and tab only.

`[:cntrl:]'
     This matches any ASCII control character.

`[:digit:]'
     This matches `0' through `9'.  Thus, `[-+[:digit:]]' matches any
     digit, as well as `+' and `-'.

`[:graph:]'
     This matches graphic characters--everything except ASCII control
     characters, space, and the delete character.

`[:lower:]'
     This matches any lower-case letter, as determined by the current
     case table (*note Case Tables::).  If `case-fold-search' is
     non-`nil', this also matches any upper-case letter.

`[:multibyte:]'
     This matches any multibyte character (*note Text
     Representations::).

`[:nonascii:]'
     This matches any non-ASCII character.

`[:print:]'
     This matches printing characters--everything except ASCII control
     characters and the delete character.

`[:punct:]'
     This matches any punctuation character.  (At present, for multibyte
     characters, it matches anything that has non-word syntax.)

`[:space:]'
     This matches any character that has whitespace syntax (*note
     Syntax Class Table::).

`[:unibyte:]'
     This matches any unibyte character (*note Text Representations::).

`[:upper:]'
     This matches any upper-case letter, as determined by the current
     case table (*note Case Tables::).  If `case-fold-search' is
     non-`nil', this also matches any lower-case letter.

`[:word:]'
     This matches any character that has word syntax (*note Syntax
     Class Table::).

`[:xdigit:]'
     This matches the hexadecimal digits: `0' through `9', `a' through
     `f' and `A' through `F'.


File: elisp,  Node: Regexp Backslash,  Prev: Char Classes,  Up: Syntax of Regexps

34.3.1.3 Backslash Constructs in Regular Expressions
....................................................

For the most part, `\' followed by any character matches only that
character.  However, there are several exceptions: certain
two-character sequences starting with `\' that have special meanings.
(The character after the `\' in such a sequence is always ordinary when
used on its own.)  Here is a table of the special `\' constructs.

`\|'
     specifies an alternative.  Two regular expressions A and B with
     `\|' in between form an expression that matches anything that
     either A or B matches.

     Thus, `foo\|bar' matches either `foo' or `bar' but no other string.

     `\|' applies to the largest possible surrounding expressions.
     Only a surrounding `\( ... \)' grouping can limit the grouping
     power of `\|'.

     If you need full backtracking capability to handle multiple uses of
     `\|', use the POSIX regular expression functions (*note POSIX
     Regexps::).

`\{M\}'
     is a postfix operator that repeats the previous pattern exactly M
     times.  Thus, `x\{5\}' matches the string `xxxxx' and nothing
     else.  `c[ad]\{3\}r' matches string such as `caaar', `cdddr',
     `cadar', and so on.

`\{M,N\}'
     is a more general postfix operator that specifies repetition with a
     minimum of M repeats and a maximum of N repeats.  If M is omitted,
     the minimum is 0; if N is omitted, there is no maximum.

     For example, `c[ad]\{1,2\}r' matches the strings `car', `cdr',
     `caar', `cadr', `cdar', and `cddr', and nothing else.
     `\{0,1\}' or `\{,1\}' is equivalent to `?'.
     `\{0,\}' or `\{,\}' is equivalent to `*'.
     `\{1,\}' is equivalent to `+'.

`\( ... \)'
     is a grouping construct that serves three purposes:

       1. To enclose a set of `\|' alternatives for other operations.
          Thus, the regular expression `\(foo\|bar\)x' matches either
          `foox' or `barx'.

       2. To enclose a complicated expression for the postfix operators
          `*', `+' and `?' to operate on.  Thus, `ba\(na\)*' matches
          `ba', `bana', `banana', `bananana', etc., with any number
          (zero or more) of `na' strings.

       3. To record a matched substring for future reference with
          `\DIGIT' (see below).

     This last application is not a consequence of the idea of a
     parenthetical grouping; it is a separate feature that was assigned
     as a second meaning to the same `\( ... \)' construct because, in
     practice, there was usually no conflict between the two meanings.
     But occasionally there is a conflict, and that led to the
     introduction of shy groups.

`\(?: ... \)'
     is the "shy group" construct.  A shy group serves the first two
     purposes of an ordinary group (controlling the nesting of other
     operators), but it does not get a number, so you cannot refer back
     to its value with `\DIGIT'.  Shy groups are particularly useful
     for mechanically-constructed regular expressions, because they can
     be added automatically without altering the numbering of ordinary,
     non-shy groups.

     Shy groups are also called "non-capturing" or "unnumbered groups".

`\(?NUM: ... \)'
     is the "explicitly numbered group" construct.  Normal groups get
     their number implicitly, based on their position, which can be
     inconvenient.  This construct allows you to force a particular
     group number.  There is no particular restriction on the numbering,
     e.g. you can have several groups with the same number in which case
     the last one to match (i.e. the rightmost match) will win.
     Implicitly numbered groups always get the smallest integer larger
     than the one of any previous group.

`\DIGIT'
     matches the same text that matched the DIGITth occurrence of a
     grouping (`\( ... \)') construct.

     In other words, after the end of a group, the matcher remembers the
     beginning and end of the text matched by that group.  Later on in
     the regular expression you can use `\' followed by DIGIT to match
     that same text, whatever it may have been.

     The strings matching the first nine grouping constructs appearing
     in the entire regular expression passed to a search or matching
     function are assigned numbers 1 through 9 in the order that the
     open parentheses appear in the regular expression.  So you can use
     `\1' through `\9' to refer to the text matched by the
     corresponding grouping constructs.

     For example, `\(.*\)\1' matches any newline-free string that is
     composed of two identical halves.  The `\(.*\)' matches the first
     half, which may be anything, but the `\1' that follows must match
     the same exact text.

     If a `\( ... \)' construct matches more than once (which can
     happen, for instance, if it is followed by `*'), only the last
     match is recorded.

     If a particular grouping construct in the regular expression was
     never matched--for instance, if it appears inside of an
     alternative that wasn't used, or inside of a repetition that
     repeated zero times--then the corresponding `\DIGIT' construct
     never matches anything.  To use an artificial example,
     `\(foo\(b*\)\|lose\)\2' cannot match `lose': the second
     alternative inside the larger group matches it, but then `\2' is
     undefined and can't match anything.  But it can match `foobb',
     because the first alternative matches `foob' and `\2' matches `b'.

`\w'
     matches any word-constituent character.  The editor syntax table
     determines which characters these are.  *Note Syntax Tables::.

`\W'
     matches any character that is not a word constituent.

`\sCODE'
     matches any character whose syntax is CODE.  Here CODE is a
     character that represents a syntax code: thus, `w' for word
     constituent, `-' for whitespace, `(' for open parenthesis, etc.
     To represent whitespace syntax, use either `-' or a space
     character.  *Note Syntax Class Table::, for a list of syntax codes
     and the characters that stand for them.

`\SCODE'
     matches any character whose syntax is not CODE.

`\cC'
     matches any character whose category is C.  Here C is a character
     that represents a category: thus, `c' for Chinese characters or
     `g' for Greek characters in the standard category table.

`\CC'
     matches any character whose category is not C.

   The following regular expression constructs match the empty
string--that is, they don't use up any characters--but whether they
match depends on the context.  For all, the beginning and end of the
accessible portion of the buffer are treated as if they were the actual
beginning and end of the buffer.

`\`'
     matches the empty string, but only at the beginning of the buffer
     or string being matched against.

`\''
     matches the empty string, but only at the end of the buffer or
     string being matched against.

`\='
     matches the empty string, but only at point.  (This construct is
     not defined when matching against a string.)

`\b'
     matches the empty string, but only at the beginning or end of a
     word.  Thus, `\bfoo\b' matches any occurrence of `foo' as a
     separate word.  `\bballs?\b' matches `ball' or `balls' as a
     separate word.

     `\b' matches at the beginning or end of the buffer (or string)
     regardless of what text appears next to it.

`\B'
     matches the empty string, but _not_ at the beginning or end of a
     word, nor at the beginning or end of the buffer (or string).

`\<'
     matches the empty string, but only at the beginning of a word.
     `\<' matches at the beginning of the buffer (or string) only if a
     word-constituent character follows.

`\>'
     matches the empty string, but only at the end of a word.  `\>'
     matches at the end of the buffer (or string) only if the contents
     end with a word-constituent character.

`\_<'
     matches the empty string, but only at the beginning of a symbol.  A
     symbol is a sequence of one or more word or symbol constituent
     characters.  `\_<' matches at the beginning of the buffer (or
     string) only if a symbol-constituent character follows.

`\_>'
     matches the empty string, but only at the end of a symbol.  `\_>'
     matches at the end of the buffer (or string) only if the contents
     end with a symbol-constituent character.

   Not every string is a valid regular expression.  For example, a
string that ends inside a character alternative without terminating `]'
is invalid, and so is a string that ends with a single `\'.  If an
invalid regular expression is passed to any of the search functions, an
`invalid-regexp' error is signaled.


File: elisp,  Node: Regexp Example,  Next: Regexp Functions,  Prev: Syntax of Regexps,  Up: Regular Expressions

34.3.2 Complex Regexp Example
-----------------------------

Here is a complicated regexp which was formerly used by Emacs to
recognize the end of a sentence together with any whitespace that
follows.  (Nowadays Emacs uses a similar but more complex default
regexp constructed by the function `sentence-end'.  *Note Standard
Regexps::.)

   First, we show the regexp as a string in Lisp syntax to distinguish
spaces from tab characters.  The string constant begins and ends with a
double-quote.  `\"' stands for a double-quote as part of the string,
`\\' for a backslash as part of the string, `\t' for a tab and `\n' for
a newline.

     "[.?!][]\"')}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"

In contrast, if you evaluate this string, you will see the following:

     "[.?!][]\"')}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"
          => "[.?!][]\"')}]*\\($\\| $\\|  \\|  \\)[
     ]*"

In this output, tab and newline appear as themselves.

   This regular expression contains four parts in succession and can be
deciphered as follows:

`[.?!]'
     The first part of the pattern is a character alternative that
     matches any one of three characters: period, question mark, and
     exclamation mark.  The match must begin with one of these three
     characters.  (This is one point where the new default regexp used
     by Emacs differs from the old.  The new value also allows some
     non-ASCII characters that end a sentence without any following
     whitespace.)

`[]\"')}]*'
     The second part of the pattern matches any closing braces and
     quotation marks, zero or more of them, that may follow the period,
     question mark or exclamation mark.  The `\"' is Lisp syntax for a
     double-quote in a string.  The `*' at the end indicates that the
     immediately preceding regular expression (a character alternative,
     in this case) may be repeated zero or more times.

`\\($\\| $\\|\t\\|  \\)'
     The third part of the pattern matches the whitespace that follows
     the end of a sentence: the end of a line (optionally with a
     space), or a tab, or two spaces.  The double backslashes mark the
     parentheses and vertical bars as regular expression syntax; the
     parentheses delimit a group and the vertical bars separate
     alternatives.  The dollar sign is used to match the end of a line.

`[ \t\n]*'
     Finally, the last part of the pattern matches any additional
     whitespace beyond the minimum needed to end a sentence.


File: elisp,  Node: Regexp Functions,  Prev: Regexp Example,  Up: Regular Expressions

34.3.3 Regular Expression Functions
-----------------------------------

These functions operate on regular expressions.

 -- Function: regexp-quote string
     This function returns a regular expression whose only exact match
     is STRING.  Using this regular expression in `looking-at' will
     succeed only if the next characters in the buffer are STRING;
     using it in a search function will succeed if the text being
     searched contains STRING.

     This allows you to request an exact string match or search when
     calling a function that wants a regular expression.

          (regexp-quote "^The cat$")
               => "\\^The cat\\$"

     One use of `regexp-quote' is to combine an exact string match with
     context described as a regular expression.  For example, this
     searches for the string that is the value of STRING, surrounded by
     whitespace:

          (re-search-forward
           (concat "\\s-" (regexp-quote string) "\\s-"))

 -- Function: regexp-opt strings &optional paren
     This function returns an efficient regular expression that will
     match any of the strings in the list STRINGS.  This is useful when
     you need to make matching or searching as fast as possible--for
     example, for Font Lock mode.

     If the optional argument PAREN is non-`nil', then the returned
     regular expression is always enclosed by at least one
     parentheses-grouping construct.  If PAREN is `words', then that
     construct is additionally surrounded by `\<' and `\>'.

     This simplified definition of `regexp-opt' produces a regular
     expression which is equivalent to the actual value (but not as
     efficient):

          (defun regexp-opt (strings paren)
            (let ((open-paren (if paren "\\(" ""))
                  (close-paren (if paren "\\)" "")))
              (concat open-paren
                      (mapconcat 'regexp-quote strings "\\|")
                      close-paren)))

 -- Function: regexp-opt-depth regexp
     This function returns the total number of grouping constructs
     (parenthesized expressions) in REGEXP.  This does not include shy
     groups (*note Regexp Backslash::).


File: elisp,  Node: Regexp Search,  Next: POSIX Regexps,  Prev: Regular Expressions,  Up: Searching and Matching

34.4 Regular Expression Searching
=================================

In GNU Emacs, you can search for the next match for a regular
expression either incrementally or not.  For incremental search
commands, see *note Regular Expression Search: (emacs)Regexp Search.
Here we describe only the search functions useful in programs.  The
principal one is `re-search-forward'.

   These search functions convert the regular expression to multibyte if
the buffer is multibyte; they convert the regular expression to unibyte
if the buffer is unibyte.  *Note Text Representations::.

 -- Command: re-search-forward regexp &optional limit noerror repeat
     This function searches forward in the current buffer for a string
     of text that is matched by the regular expression REGEXP.  The
     function skips over any amount of text that is not matched by
     REGEXP, and leaves point at the end of the first match found.  It
     returns the new value of point.

     If LIMIT is non-`nil', it must be a position in the current
     buffer.  It specifies the upper bound to the search.  No match
     extending after that position is accepted.

     If REPEAT is supplied, it must be a positive number; the search is
     repeated that many times; each repetition starts at the end of the
     previous match.  If all these successive searches succeed, the
     search succeeds, moving point and returning its new value.
     Otherwise the search fails.  What `re-search-forward' does when
     the search fails depends on the value of NOERROR:

    `nil'
          Signal a `search-failed' error.

    `t'
          Do nothing and return `nil'.

    anything else
          Move point to LIMIT (or the end of the accessible portion of
          the buffer) and return `nil'.

     In the following example, point is initially before the `T'.
     Evaluating the search call moves point to the end of that line
     (between the `t' of `hat' and the newline).

          ---------- Buffer: foo ----------
          I read "-!-The cat in the hat
          comes back" twice.
          ---------- Buffer: foo ----------

          (re-search-forward "[a-z]+" nil t 5)
               => 27

          ---------- Buffer: foo ----------
          I read "The cat in the hat-!-
          comes back" twice.
          ---------- Buffer: foo ----------

 -- Command: re-search-backward regexp &optional limit noerror repeat
     This function searches backward in the current buffer for a string
     of text that is matched by the regular expression REGEXP, leaving
     point at the beginning of the first text found.

     This function is analogous to `re-search-forward', but they are not
     simple mirror images.  `re-search-forward' finds the match whose
     beginning is as close as possible to the starting point.  If
     `re-search-backward' were a perfect mirror image, it would find the
     match whose end is as close as possible.  However, in fact it
     finds the match whose beginning is as close as possible (and yet
     ends before the starting point).  The reason for this is that
     matching a regular expression at a given spot always works from
     beginning to end, and starts at a specified beginning position.

     A true mirror-image of `re-search-forward' would require a special
     feature for matching regular expressions from end to beginning.
     It's not worth the trouble of implementing that.

 -- Function: string-match regexp string &optional start
     This function returns the index of the start of the first match for
     the regular expression REGEXP in STRING, or `nil' if there is no
     match.  If START is non-`nil', the search starts at that index in
     STRING.

     For example,

          (string-match
           "quick" "The quick brown fox jumped quickly.")
               => 4
          (string-match
           "quick" "The quick brown fox jumped quickly." 8)
               => 27

     The index of the first character of the string is 0, the index of
     the second character is 1, and so on.

     After this function returns, the index of the first character
     beyond the match is available as `(match-end 0)'.  *Note Match
     Data::.

          (string-match
           "quick" "The quick brown fox jumped quickly." 8)
               => 27

          (match-end 0)
               => 32

 -- Function: string-match-p regexp string &optional start
     This predicate function does what `string-match' does, but it
     avoids modifying the match data.

 -- Function: looking-at regexp
     This function determines whether the text in the current buffer
     directly following point matches the regular expression REGEXP.
     "Directly following" means precisely that: the search is
     "anchored" and it can succeed only starting with the first
     character following point.  The result is `t' if so, `nil'
     otherwise.

     This function does not move point, but it updates the match data,
     which you can access using `match-beginning' and `match-end'.
     *Note Match Data::.  If you need to test for a match without
     modifying the match data, use `looking-at-p', described below.

     In this example, point is located directly before the `T'.  If it
     were anywhere else, the result would be `nil'.

          ---------- Buffer: foo ----------
          I read "-!-The cat in the hat
          comes back" twice.
          ---------- Buffer: foo ----------

          (looking-at "The cat in the hat$")
               => t

 -- Function: looking-back regexp &optional limit greedy
     This function returns `t' if REGEXP matches text before point,
     ending at point, and `nil' otherwise.

     Because regular expression matching works only going forward, this
     is implemented by searching backwards from point for a match that
     ends at point.  That can be quite slow if it has to search a long
     distance.  You can bound the time required by specifying LIMIT,
     which says not to search before LIMIT.  In this case, the match
     that is found must begin at or after LIMIT.

     If GREEDY is non-`nil', this function extends the match backwards
     as far as possible, stopping when a single additional previous
     character cannot be part of a match for regexp.  When the match is
     extended, its starting position is allowed to occur before LIMIT.

          ---------- Buffer: foo ----------
          I read "-!-The cat in the hat
          comes back" twice.
          ---------- Buffer: foo ----------

          (looking-back "read \"" 3)
               => t
          (looking-back "read \"" 4)
               => nil

 -- Function: looking-at-p regexp
     This predicate function works like `looking-at', but without
     updating the match data.

 -- Variable: search-spaces-regexp
     If this variable is non-`nil', it should be a regular expression
     that says how to search for whitespace.  In that case, any group of
     spaces in a regular expression being searched for stands for use of
     this regular expression.  However, spaces inside of constructs
     such as `[...]' and `*', `+', `?' are not affected by
     `search-spaces-regexp'.

     Since this variable affects all regular expression search and match
     constructs, you should bind it temporarily for as small as possible
     a part of the code.