Text has been rephrased or modified that does not exist in the original source.
[ed] Edward J. Sabol sabol A T alderaan gsfc nasa gov
[elijah] Eli the Bearded process A T qz little-neck ny us
[hal] Hal Wine hal A T dtor com
[jari] Jari Aalto jari aalto A T poboxes dt com
[philip] Philip Guenther guenther A T gac edu
[richard] Richard Kabel rkabel A T sequent com
[sean] Sean B. Straw PSE-L A T mail professional org
[timothy] Timothy J Luoma luomat+procmail A T luomat peak org
[walter] Walter Dnes waltdnes A T interlog com
[FAQ] Procmail FAQ era A T iki.fi
[manual] Quote from some procmail manual page
[maintainer] As of 2000-09 the maintainer is [jari]
A big thanks goes to all these people:
- 1999-06-16 Mark Seiden <mis A T seiden.com> Did an enormous
work to proofread the v1.74. He sent a massive 105k patch
with many editorial corrections.
- 1999-01-08 Steven Alexander <stevena A T teleport.com>
thought that a small perl script would help me to fix
spelling mistakes more easily. The script has been much
better correction program than anything else.
- 1999 <Guido.Van.Hoecke A T se.bel.alcatel.be> took v1.48
and sent a huge 55k patch to correct many grammar
typos.
- 1998-10-28 Richard Kabel <rkabel A T sequent.com> sent
massive patch to correct language and provided excellent
improvement comments.
- 1998 Era Eriksson proof read the v1.12 and sent numerous
corrections.
- Karl E. Vogel <vogelke A T c17mis.region2.wpafb.af.mil>
sent numerous new anti-spam links to be added to the
document.
- 1998 John Gianni <jjg A T cadence.com> sent nice
recipes: one is now in the procmail module list and the
other ideas I have added to this tips file.
- 1998 Tim Potter <tpot A T zip.com.au> had a spare moment
with v1.27 and sent a patch to correct spelling mistakes.
1.4 Version information
Here is version and file size log of the text file, which gives you
some estimate how the document has evolved.
v2.42 2008-03-10 510 Add Gmane URL. Links checked.
v2.36 2007-10-02 519 New HTML/CSS layout. Links checked.
v2.30 2006-02-15 519 Sanitized all email addresses.
v2.27 2004-10-10 516 Spam related things removed.
v2.16 2002-08-31 596 Removed old UBE pointers.
v2.13 2002-08-13 596 Removed old UBE pointers.
v2.5 2002-02-01 608 Spelling checked with Emacs ispell
v2.2 2002-01-28 608 URL links checked and updated
v2.0 2001-08-09 608 http://pm-doc.sourceforge.net opened.
v1.77 1999-12-27 603 Netscape spam filters added
v1.76 1999-10-01 602 Mark Seiden's patch applied. Now under CVS.
v1.74 1999-04-26 599 document moved to www.procmail.org
v1.72 1999-04-21 597 Links corrected
v1.71 1999-03-29 597 Ricochet -- Perl script to fight UBE
v1.70 1999-02-26 592 procmail's Y2K compliance
v1.69 1999-02-23 590 RFC and using MIME in Usenet postings
v1.68 1998-01-29 587 Added "Lua" language pointer
v1.67 1998-01-07 579 Eli's procmail recipes in module section
v1.66 1998-12-14 578 Philip took care of bugs/patches listing
v1.64 1998-11-26 602 More Richard's comments integrated
v1.63 1998-10-30 595 Richard's english correction patch
v1.60 1998-10-21 591 UMASK, .forward if procmail already is LDA
v1.58 1998-10-12 583 SmartList and other MLM software discussed
v1.57 1998-10-06 575 PLUS addr. Convert HTML body to text
v1.55 1998-08-29 565 Fetching fields with formail -x
v1.53 1998-08-24 554 Procmail doesn't pass 8bit characters
v1.52 1998-08-24 553 Flag c forking study, procmail wish list
v1.51 1998-08-18 541 Small changes. MIME notes
v1.49 1998-08-10 529 Guido.Van.Hoeck's 55k patch applied
v1.46 1998-06-24 526 Added live urls to procmail archive
v1.45 1998-06-23 521 All recipes checked by eye. Many fixes.
v1.44 1998-06-19 516 Detecting mailing lists with pm-jalist.rc
v1.41 1998-06-17 510 How to disable recipe quickly with
v1.36 1998-04-03 493 Includerc rewritten, plus addressing
v1.34 1998-04-02 488 ORing and supreme scoring added
v1.32 1998-03-23 471 All recipes checked (by eye)
v1.31 1998-03-10 469 Better ordering: ORing rules discussed
v1.29 1998-01-30 429 "regexp" section rewrite.
v1.24 1997-12-30 415 up till 1996-12 is now included
v1.17 1997-12-09 343 up till archive 1996-07 now included
v1.14 1997-11-25 260
v1.13 1997-11-08 218 Era's correction suggestions.
v1.10 1997-10-13 181 archive file 1995-10's tips included
v1.9 1997-10-11 142
v1.8 1997-10-01 127
v1.6 1997-09-18 94
v1.5 1997-09-16 76
v1.05 1997-09-14 53
v1.01 1997-09-13 46 (k) |
1.5 Document layout and maintenance
The base version of this document is kept in plain text
format, which requires no special editors or learning a markup
language. The tools to help maintaining this document include:
Text version of this file is converted to HTML with:
perl -S <conversion program> --Auto-detect --Out pm-tips.txt |
SENDING IMPROVEMENTS
If you have any spare moment, a glimpse to find some spelling
mistakes or misuse of the verbs, please go ahead and send a
patch to maintainer of this page. The preferred way to send
corrections to this document is as diff(1) output. Here's
how to make corrections send them forward. Please try to use
unified diff -u option. The source is available at
http://pm-doc.cvs.sourceforge.net/pm-doc/pm-doc/doc/tips/
cp pm-tips.txt pm-tips.txt.orig
... load the pm-tips.txt to a text editor / edit / save
... Generate the difference
diff -bwu pm-tips.txt.orig pm-tips.txt > pm-tips.txt.patch
...Send content of pm-tips.txt.diff by mail to maintainer |
If you do not know what a diff format is, then simply send
your comments in email. Use "Linux: pm-doc" as subject to
bypass spam filtering.
1.6 About presented recipes
The recipes have been kept as original as possible, but a
generalization of the ideas have been done when necessary. If
some recipe doesn't work as announced, please a) send note to
[maintainer] b) send mail to procmail mailing list and ask how
to correct it. Sometimes a simple dot(.) has been used in
regular expressions, where the right, pedantic way would have
been to use an escaped dot. If you want to be very strict, you
should use the escaped dot where applicable.
# free hand version # pedantic version
:0 :0
* match.this.site * match\.this\.site |
Procmail also accepts assignments without quotes, like this:
var = value
num = 1
dir = /var/mail |
But in this document a strict style has been adopted, where literal
strings are assigned with double quotes:
That's because the procmail code checker (Emacs package
tinyprocmail.el) then won't warn about missing dollar-sign, which
might have very well been forgotten. Emacs package font-lock.el,
a syntax highlighting assistant, also displays double quoted string
in color.
# If you do this...
var = value
# then you might have made a typo. It is in fact not clear
# what was intended:
var = "value" # Did you mean: literal assignment?
var = $value # Did you mean: variable assignment? |
Recipe flags are also not stuck together, because the visual
distinction of :0 and flags is a valuable one. Reasoning for
which flags are kept together and in which order is explained later
in details.
# Erm, all stuck] # This may be visually more clear
:0ABDc: :0 A BD c: |
1.7 Variables used in recipes
These are part of the procmail module pm-javar.rc and are used in
recipes.
# Pure newline; typical usage if you want to write
# Something directly to procmail's active logfile:
#
# LOG = "$NL message $NL"
NL = "
" |
Refer to "improving Space-Tab syndrome" section for more details
WSPC = " " # whitespace: space + tab
SPC = "[$WSPC]" # Regexp: space + tab
SPCL = "($SPC|$)" # whitespace + linefeed: spc/tab/nl
NSPC = "[^$WSPC]" # negation
s = $SPC # shortname: like perl -- \s
d = "[0-9]" # A digit -- Perl \d
w = "[0-9a-z_A-Z]" # A word -- Perl \w
W = "[^0-9a-z_A-Z]" # A word -- Perl \W
a = "[a-zA-Z]" # A word, only alphabetic chars |
Writing recipes is now a little easier and may look more clear at
least to people that have accustomed reading Perl regular expression
short names:
:0
*$ Header-Name:$s+$d+$s+$d # Matches "Header: 11 12"
{
# Matched "whitespace" + "digit" + "whitespace" + "digit"
# Do something
} |
SUPREME = 9876543210, is the highest score value that causes
procmail to bail out. [david] Actually the maximum is 2147483647,
but 9876543210 is easier to remember/type and will function just as
well.
PMSRC = Procmail module source code directory. Location
where *.rc files reside. Anywhere you want it to be. Usually
$HOME/pm or $HOME/procmail/lib. Here you can keep the
procmail files, log files and includerc scripts. Another
common used synonym is PMDIR.
SPOOL = Directory where your procmail delivers the categorized
messages. Like mailing lists:
list.procmail, list.lynx-users, list.emacs, list.elm |
and work mail:
work.announcements, work.lab, work.doc, work.customer |
and your private message:
mail.usenet, mail.private, mail.default, mail.perl |
and unimportant messages
junk.daemon, junk.cron, junk.ube |
If you read the procmail-delivered files directly, this directory
is usually $HOME/Mail or $HOME/mail. If you use some other software
that reads these files as mail spool files (like Emacs Gnus), then
this directory is typically $HOME/Mail/spool or similar.
MYXLOOP = Used to prevent re-sending messages that have already
been handled. Typically $LOGNAME@$HOST, but this can be any user
chosen string. Make it it unique to your address. In this document
the definition is:
MYXLOOP = "X-Loop: $LOGNAME@$HOST" |
SENDMAIL = Program to deliver composed mail. Usually standard
Unix sendmail(1), but it must have some switches with it. See man
page for more. We use following definition in scripts:
SENDMAIL = "sendmail -oi -t" |
NICE = In a Unix environment you can lower the scheduling
priority with nice(1). If you are conscious of how many external
processes you launch for each piece of mail it would be polite to
lower the priority of such processes. You may see in this document
that external processes are called with NICE enabled:
:0 w # Same as "nice -10 script.pl"
| $NICE script.pl |
IS functions; Functions to test file or directory attributes.
E.g. IS_EXIST is defined as "test -e" and so on. The definitions of
IS functions are system-dependent. E.g. On Irix the "-e" option
is not recognized and the nearest equivalent is "test -r". All IS
functions are defined in the pm-javar.rc module.
1.8 About "useless use of cat award"
Randal Schwartz, a well-known Perl programmer and Perl book writer,
started giving rewards for the "useless use of cat command"
whenever someone wrote examples without token "<". Like this:
$ cat file.name.this | wc -l |
Instead he writes that the call should have been written like this,
which saves the pipe (never mind that wc can read the file
directly; this is an example).
[Paul David Fardy <pdf A T morgan.ucs.mun.ca>] There is weight
in the pipeline, but the true cost is in process startup. Try
running wc 100 times on /etc/motd or on this message. My tests show
the useless use of cat doubles the real and processing time (real,
user, and system time are each roughly doubled):
$ cat > /tmp/randal << 'EOF'
COUNT=100
i=1
while :
do
wc < /etc/motd > /dev/null
i=$(expr $i + 1)
[ "$i" = "$COUNT" ] && break
done
EOF
$ cat > /tmp/useless << 'EOF'
COUNT=100
i=1
while :
do
cat /etc/motd | wc > /dev/null
i=$(expr $i + 1)
[ "$i" = "$COUNT" ] && break
done
EOF
# NOTE: The timing values should be read as absolute, but
# examine the relative differencies.
$ time sh /tmp/randall
real 0m0.568s
user 0m0.208s
sys 0m0.348s
$ time sh /tmp/useless
real 0m0.825s
user 0m0.348s
sys 0m0.476s |
This becomes important, for example, when you decide to filter all
your mail with procmail – looking for virus signatures for example.
I might well decide to look only at the first 3 or 4 kilobytes.
It's not the size of messages--most are small anyway – but the
number of messages that cause a problem. Do you want to double the
processing cost of all our mail? I'm looking at a system-wide
filter for all my users' mail. I'm considering Sendmail's mail
filter versus procmail filtering. I'll likely be using a bit of
both. And given that all of the filtering really just getting in
the way of legitimate traffic, it'd really piss me off if I naively
doubled the cost.
2.0 Procmail pointers
2.1 Where is procmail developed
Philip Guenther <guenther A T gac.edu> is currently taking care of and
coordinating procmail bug fixes. Please send any procmail bugs to
the mailing list or to bug@procmail.org. The development mailing
list is running SmarList at procmail-dev@procmail.org.
Newest Procmail code it at <http://www.procmail.org/> and
ready packages are available at Linux distributions' repositories.
2.2 Procmail resources
Procmail is discussed in Usenet newsgroup comp.mail.misc and
mailing list accessible at NNTP server
<http://news.gmane.org/gmane.mail.procmail>.
2.3 Procmail mode for Emacs
If you use Emacs, See
Procmail mode tinypm.el at
<http://freshmeat.net/projects/emacs-tiny-tools>. It can also
be used to statically syntax check recipes. Here is an example
of its output:
*** 1997-11-24 22:13 (pm.lint) 3.11pre7 tinypm.el 1.80
cd /users/jaalto/junk/
pm.lint:010: Warning, no right hand variable found. ([$`']
pm.lint:055: Pedantic, flag orer style is not standard `hW:'
pm.lint:060: Warning, message dropped to folder, you need lock.
pm.lint:062: Warning, recipe with "|" may need `w' flag.
pm.lint:073: Warning, Formail used but no `f' flag found. |
2.4 Procmail module library project
2.4.1 Where to get various modules
- Procmail module library.
The idea of plug-in modules was originally coined by Alan
Stebbens (<alan.stebbens A T software.com>, <alan.stebbens
A T openwave.com>).
2.4.2 Terminology
subroutine/module = A piece of code that gets something in
INPUT and responds with OUTPUT. Subroutine is not message
specific.
recipe = A piece of code that is somewhat self
contained: It reads something from the message or does
something according to matches in message. Recipe may be
message-specific. Recipe is more free-form and does not
follow strict INPUT/OUTPUT methodology.
2.4.3 Foreword to using modules
In the module listing, some of the modules are recipes and
some can be considered subroutines. Let's take the address
exploder. First, visualise following familiar programming
language pseudo code:
(ret-val1, ret-val2 ...) = Function(arg1, arg2, arg3 ...) |
Function may return multiple arguments and multiple
arguments can be passed to it. Clear so far. The concept
applies to procmail modules like this:
RC_FUNCTION = $PMSRC/pm-xxx.rc # name the subroutine/module
RC_FUNCTION2 = ...
INPUT = "value" # Set the arg1 for module
INCLUDERC = $RC_FUNCTION # Call Function( $arg1 )
:0 # Examine function's return value
* ERROR ?? yes
... |
This should be pretty clear too. You just have to look into the
subroutine/module which you intend to use, to find out what
arguments it wants which you need to set (INPUT) before calling
it. The documentation also tells you what values are returned, e.g.
one of them was ERROR.
If it were recipe, the call would be almost the same, but
instead of returning values, the recipe/module most likely does
something to your message or writes something to the data files
etc. A recipe is much higher level, because it may
call multiple subroutines/modules. The distinction between
subroutine and recipe module type is not crystal clear, but I hope
the above will clarify a bit the Procmail module/subroutine/recipe
concept.
2.4.4 Header file modules
These are like #include .h files in C, they define common
variables, but do not contain actual code.
- pm-javar.rc – Defines standard variables: SPC WSPC NSPC SPCL and
perl styled \s \d \D \w \W and \a \A (alphabetic characters only)
- headers.rc – From Alan's procmail-lib. Define standard regexp
and macros: address, from, to, cc, list_precedence
2.4.5 General modules
- pm-jafrom.rc – Derive FROM field without calling formail
unnecessarily. If all else fails, use formail.
- get-from.rc – From Alan's procmail-lib. get the "best" From
address. Sets FROM and FRIENDLY, the latter being the "friendly"
user name sans address.
- pm-jaaddr.rc – Subroutine to extract various mail components
from INPUT. Like address=foo@example.com, net=com, account=foo...
- pm-jastore.rc – Subroutine for general mailbox delivery.
Define MBOX as the folder where to drop
message and this subroutine will store it appropriately.
Supports single mboxes, ".gz" mbox files, directory files and
MH folders with rcvstore.
2.4.6 Spam modules
Read "Thoughts about increasing spam annoyance" at
<http://pm-lib.sourceforge.net/README.html> which
explains these modules better in context "2.0 A lightweight
UBE block system with pure procmail".
- pm-jaube.rc – Subroutine to investigate the message
for know spam pattern like numeric address, invalid address,
Pegasus bulk mail, advertising slogans etc. This is the
generic Spam detection module. Needs only one external
program: nslokup1(1) to verify the sender's domain. The
results of classification appears in returned variables
that the caller can use for deciding what to do. Optional
headers can be added to the message to announce the
results.
- pm-jaube-keywords.rc – Subroutine to scrutinize the
message against known spam keywords. This is the "bare
bones" and very simplistic (but fast) way to check if
message is Spam. The results of classification appears in
returned variables that the caller can use for deciding
what to do.
- pm-jaube-prg-runall.rc – An Interface module to call
external statistical bayesian spam classifier programs.
This subroutine will call other modules, like
pm-jaube-prg-bogofilter.rc (for bogofilter),
pm-jaube-prg-bsfilter.rc (for bsfilter) and many many more
that help fighting spam. It is possible to activate
specific bayesian programs available in current host.
2.4.7 Mime modules
- pm-jamime.rc – Subroutine to read MIME headers and put the
mime version, boundary string, content-type information to
variables.
- pm-jamime-decode.rc – recipe to decode quoted-printable
or base64 encoding in the body.
- pm-jamime-kill.rc – Recipe for attachment killing: wipes out the
extra mime cruft leaving only the plain text. Applications for
killing: ms-tnef attachment (MS Explorer 7k),
HTML attachments (Netscape, MS Express) vcard (Netscape),
PCX attachment (Lotus Notes).
- pm-jamime-save.rc – Recipe for saving simple file attachment.
When you receive ONE file attachment in a message, this
recipe can save it in a separate directory. The content is
also decoded (base64,qp) while saving.
2.4.8 Filtering message body or headers
- pm-jadaemon.rc – Handle DAEMON messages by changing subject to
reflect a) the error reason b) to whom the message was originally
sent c) original subject sent and what was the subject. Store the
DAEMON messages to separate folder.
- pm-jasubject.rc – Standardize Subject "Re32: FW: Sv: message"
or any other derivate to de facto "Re: message"
- pm-janetmind.rc – [obsolete]
Reformat minder.netmind.com messages (no longer exists 2005).
The default 4k message is shortened to a few important lines.
2.4.9 Mailing list modules
- pm-jalist.rc – Subroutine to extract mailing list name from
message. Do you need to add a new recipe to your .procmailrc
every time you subscribe to new mailing list? If you do,
take a look at this module, which examines the message and
defines variable LIST to hold the mailing list name. You
can use it directly to save the messages adaptively to
correct folders. No more hand work and manual storing
of mailing list messages.
2.4.10 Miscellaneous modules
- pm-jaempty.rc – check if message body is empty (nothing
relevant). Define variable BODY_EMPTY to "yes" or "no" if
message is empty.
- pm-janslookup.rc – Run nslookup on given address. If you
compose return address with "formail -rt -x To:" you can
verify if domain is registered before sending reply. Uses cache
for already looked up domains. This module is alos used
by the pm-jaube.rc to verify the sender's domain.
- guess-mua.rc – Guess the Mail User Agent and set MUA:
MH,PINE,MAIL
2.4.11 Low-level Date and time handling
For these, you get the date string from somewhere, then feed
it to some of these subroutines:
- pm-jatime.rc – a low-level subroutine. Parse time "hh:mm:ss"
from variable INPUT
- pm-jadate1.rc – a low-level subroutine. Parse date
"Tue, 31 Dec 1997 19:32:57" from variable INPUT
- pm-jadate2.rc – a low-level subroutine. Parse ISO standard date
"1997-11-01 19:32:57" from variable INPUT
- pm-jadate3.rc – a low-level subroutine. Parse date
Tue Nov 25 19:32:57 from variable INPUT
- pm-jadate4.rc – Call shell command "date" once to construct RFC
"Tue, 31 Dec 1997 19:32:57" and parse the YY MM HH and other
values. You usually use this subroutine if you can't get the date
anywhere else.
2.4.12 Higher-level Date and time handling
You use these recipes to get the date directly from the message:
- pm-jadate.rc – higher-level recipe. Read date from message's
headers: From_ Received, or call shell date if none succeeds.
- date.rc – higher-level recipe.
From Alan's procmail-lib: parse date or from headers
Resent-Date:, Date, and From
2.4.13 Forwarding and account modules
- pm-japop3.rc – Pop3 movemail implemented with procmail. You can
send a "pop3" request to move your messages from account X to
account Y. Each message is send separately. This recipe listens
to "pop3" requests.
- pm-jafwd.rc – control forwarding remotely. You can change the
forward address with a "control message" or turn
forwarding on/off with a "control message"
- pm-japing.rc – Send short reply when subject contains the word
"ping" to show that the account is up and mail address is
valid.
- correct-addr.rc – From alan's procmail lib. To help forward mail
from an OLD address to a NEW address, and do some mailing list
mail management. This recipe file is intended to make it easy
for users to forward their mail from their old address to a new
address, and, at the same time, educate their correspondents
about it by CC'ing them with the mail.
2.4.14 Vacation modules
- pm-javac.rc – A framework for your vacation replies. This
recipe will handle the vacation cache and compose an initial
reply; which you only need to fill in. (Like putting vacation
message to the body)
- ackmail.rc – From Alan's procmail lib. procmail rc to
acknowledge mail (with either a vacation message, or an
acknowledgment)
2.4.15 Message-id based modules
- pm-jadup.rc – Handle duplicate messages by Message-Id.
Store duplicate message in separate folder.
- dupcheck.rc – From Alan's procmail-lib. If the current mail has
a "Message-Id:" header, run the mail through "formail -D",
causing duplicate messages to be dropped. Can use MD5 hash in
cache.
2.4.16 Cron modules
- pm-jacron.rc – A framework for your daily cron tasks. This
recipe contains all the needed checks to ensure that your
includerc is called whenever a day changes. (Day change is
subject to messages you receive). Your own cron includerc is
run once a day.
2.4.17 Backup modules
- pm-jabup.rc – Save messages to backup directory and keep only N
messages per day. Idea by John Gianni. Note:
The implementation will always call shell for each message you
receive; so using this module is not recommended if you get
many messages per day. Instead, use the cron module to clean
the messages' backup directory only once a day, and not every time
a message arrives.
2.4.18 Confirmation modules
- pm-jacookie.rc – Handle cookie (unique id) confirmations.
Also known as Procmail authentication service (PAS). This
simple procmail module will accept messages only from
users who have returned a "cookie" key. You can use this
to to protect some services before access. Uses subroutine
pm-jacookie1.rc, which generates the unique cookie; CRC 32
by default. NOTE: Please read page
<http://pm-lib.sf.net/README.html> before you may start
thinking to use this module as a generic Challenge-Response
module to reduce spam.
2.5 Procmail code to filter UBE
Sysadms remember : Spam filtering is much more
efficiently done in the MTA, especially if you are just
looking at From and To lines. For example, you can setup in
Exim a rule that blocks \d.*@aol\.com (that is any aol.com
local part that begins with a digit). AOL guarantees that
none of their addresses begin with a digit. Exim rejects
such bogus addresses at the SMTP level before the message is
received.
- pm-jaube.rc -Procmail module library's UBE filter
After Daniel Smith
posted his spam recipes to procmail mailing list, the code
was adopted and more generalized to handle lot more UBE.
Module needs no special setup and can be installed via
simple INCLUDERC. All UBE detection happens using procmail
rules with no external files needed. The module is
available in Procmail module library at
<http://freshmeat.net/projects/procmail-lib>.
2.5.1 o Catherine A. Hampton's Spambouncer".
The attached set of procmail recipes/filters, which I
call The Spam Bouncer, are for users who are sick of spam
(unsolicited junk mail) and want to filter it out of their
mail as easily as possible. These recipes can be used as
shared recipes for a whole system, or by an individual for
their own mailbox only.
- Junkfilter.
by Gregory Sutter. Junkfilter is a user-configurable
procmail-based filter system for electronic mail. Recipes
include checks for forged headers, key words, common spam
domains, relay servers and many others.
- Nonplussed Spambouncer
Procmail module for bouncing spam. Requires sendmail with
plussed users.
3.0 Dry run testing
3.1 What is dry run testing?
It means that you call your procmail test script directly with sample
test mail
% procmail $HOME/pm/pm-test.rc < $HOME/tmp/test-mail.txt |
The script pm-test.rc has the procmail recipe you're testing
or improving. The test-mail.txt is any valid mail message
containing the headers and body. You can make one with any
text editor, e.g. vi, pico, nano, emacs or xemacs.
Here's a simple test mail skeleton. Copy verbatim:
From: me@example.com
To: me@example.com (self test)
X-info: I'm just testing
BODY OF MESSAGE SEPARATED BY EMPTY LINE
txt txt txt txt txt txt txt txt txt txt |
Remember that you can define environment variables as well in
the dry run call. Here's an example where procmail just executes
the script and does nothing fancy.
% procmail VERBOSE=on DEFAULT=/dev/null \
~/pm/pm-test.rc < ~/txt/test-mail.txt |
Suppose the script prints something to log files, but you'd instead
like to get it all dumped to screen. No problem, first find out
your tty value by calling tty at shell prompt and pass
that on the command line. Here the default LOGFILE is directed
to take care of redirecting "LOG=" commands and statement:
# `tty' tells what to fill in /dev/..
% procmail VERBOSE=on DEFAULT=/dev/null \
LOGFILE=/dev/pts/0 \
~/pm/pm-test.rc < ~/txt/test-mail.txt |
3.2 Why the From field is not okay after dry run?
Why it now says "From foo@bar Mon Sep 8 14:38:06 1997"?
Don't worry about this. It's a side-effect of running the
message through formail after having generated any auto-reply
– the auto-reply generated by "formail -rt" doesn't have a
"From " header (it's pointless for outgoing messages), so the
second formail adds one, not knowing that it'll just be
ignored by sendmail later (well, sendmail will extract the
date from it, but that's ignorable). You only see it because
you're saving to a folder instead of the mailing it.
3.3 Getting default value of a procmail variable
There's always this way to learn a variable's initial value
(note the strong quotes), which Stephen uses to get procmail's
value for $SENDMAIL in the scripts that build SmartList:
procmail LOG='$PATH' DEFAULT=/dev/null /dev/null < /dev/null |
Since LOGFILE hasn't been defined, $PATH will be printed to the
screen. One caution: if there are any variables in the definition
of $PATH (such as $HOME), they'll be expanded in the output.
4.0 Things to remember
4.1 Get the newest procmail
Lot of troubles surface only because you have an old
procmail version. Be sure to have the latest. Knock your sysadm or
ISP until he installs this version and don't give up, if you're
serious about using procmail. Here is a command to check your
procmail version number:
4.2 Csh's tilde is not supported
Many shell users have accustomed to using tilde (~)
everywhere. Unfortunately procmail doesn't expand that to home
directories; just use $HOME. When you write procmail
recipes, think sh not bash. This mind set will
automatically get your brain tuned to the right programming
habits.
4.3 Be sure to write the recipe starting right
The recipe starts with :0 or just with : but the latter
one is somewhat dangerous and easy to miss. Beware writing it
0: as it happens easily. Always put a zero after the colon
that begins the recipe. In the first versions of procmail, you
would put the number of conditions, with a default of 1. That
was annoying, and the computer can do the counting easier, so
Stephen made it so that a count of 0 indicates that the
conditions are all the lines beginning with a *. The default
is one, unless the a, A , e, or E flags is given, in
which case the default is zero. ALWAYS START a RECIPE
WITH :0.
4.4 Always set SHELL
If your login shell is a C shell (csh or tcsh), avoid havoc:
as a precaution, always put following at the top of your
$HOME/.procmailrc.
4.4.1 If system has no /bin/sh and you're forced to use csh/tcsh
[<kuhlmav A T elec.canterbury.ac.nz>] Csh and tcsh execute the
.cshrc first, THEN if, and only if it is the login shell (not
a sub shell) it executes the .login, which should contain
basic important system setting like stty commands. Likewise,
bash and ksh users are taught to define and export PATH in
profile, so our per-shell startup files would not have
clobbered the PATH set in .procmailrc the way your .cshrc did.
[philip] ...I have been told by other sysadmins that there are
systems on which csh was hacked to source the .login before the
cshrc. For various reasons I suspect these to be systems based on
older versions of BSD (say, 2.3 BSD).
As for tcsh, the order in which the .login and .cshrc is sourced is
a compile-time option which defaults to the .cshrc (or .tcshrc)
before the .login. There may be some wackos out there who change
the default in memory of the system(s) that they were raised on. I
suggest electroshock as the proper treatment.
...done sys admin on Crays, Convexes, Suns, SGIs, Decs, PC
running BSDI, Linux and Free BSD, and I have never run into a
system where the .cshrc is sourced AFTER the .login. If someone
goes to the trouble to change the order, I would love to know a
valid reason for it.
4.4.2 Procmail won't work well with SHELL set to csh derivate
[1998-08-17 PM-L <kuhlmav A T elec.canterbury.ac.nz> Volker Kuhlmann]
...The blame lies with procmail and its documentation. Obviously,
procmail is programmed with the assumption that the login shell is
a sh derivative. This assumption is a) not very nice, and b) not
stated in the otherwise very good documentation. Of course a user
can set SHELL to tcsh. If then procmail is too stupid to hack it,
it ought to say so clearly, and the above-mentioned questions of
people using tcsh will disappear from this list. One could also be
nice and point out pitfall (3) mentioned above in the procmail
docs. It is customary to have terminal configuration in .login. If
it is shifted to .cshrc it should be properly surrounded by if ..
endif. Perhaps it is not customary to configure the terminal in
bashrc (where else then? - only a rhetorical question), but that
is no reason to blame it on tcsh.
My .cshrc only setenvs the environment when it is a login shell
(shell level 1). Obviously procmail runs a login shell. As I said
earlier, there are good reasons for setting a full PATH
independently whether the shell is interactive or not. So, when
procmail executes programs with SHELL=tcsh, PATH is set to the tcsh
defaults. That may or may not be desirable, depending on the
individual case. No problem with that and avoidable (run tcsh with
-f). Nice if it was in the procmail docs.
But then, the PATH getting clobbered is not the point here (just a
side-effect I didn't realize until 2 people pointed it out).
4.5 Check and set PATH
It is very likely that the default PATH environment variable
that your $HOME/.procmailrc sees it not enough. To play
safe, so that all the needed binaries can be found when
escaping to shell in .procmailrc, set the PATH variable as a
very first statement. Adding paths that don't exist in another
system but does exists in the other makes it possible to use
the same $HOME/.procmail on multiple servers (Like HP, SUN,
IBM, Linux)
PATH = \
$HOME/bin:\
/usr/local/gnu/bin:\
/usr/contrib/bin:\
/usr/local/bin:\
/opt/local/bin:\
/bin:\
/usr/bin:\
/usr/lib:\
/usr/ucb:\
/usr/sbin:\
/vol/bin:\
/vol/lib:\
/vol/local/bin:\
${PATH} |
4.6 Keep the log on all the time
It's best that you put these variables at the very start of
your .procmailrc. When you start using procmail, you also
want to know all the time what's happening there and why your
recipes didn't work as expected. The answer to almost all your
questions can be found in the log file. As the log file will
grow to be quite big, remember to set up a cron job to keep it
moderate size.
LOGFILE = $PMSRC/pm.log
LOGABSTRACT = "all"
VERBOSE = "on" |
4.7 Never add a trailing slash for directories
Drop the trailing slash: it'll choke if you ever end up on
Apollo's DomainOS where double slashes are network references.
If the directory has a trailing slash, it will choke on most
OSes (they treat it like "/.").
DIR = /full/path/to/www/directory/ # Wait...
FILE = $ARCHIVEDIR/file # Ouch ! |
4.8 Remember what term DELIVERED means
When procmail delivers a piece of mail, whether to a file or a
pipe-command, if the write succeeds, then the mail is
considered to have been delivered, and processing stops with
that recipe file. Here is the relevant text from man page:
...There are two kinds of recipes: delivering and non-delivering
recipes. If a delivering recipe is found to match, procmail
considers the mail (you guessed it) delivered and will cease
processing the rcfile after having successfully executed the
action line of the recipe. If a non-delivering recipe is found to
match, processing of the rcfile will continue after the action
line of this recipe has been executed.
4.9 Beware putting comment in wrong places
You like commenting a lot, sticking them everywhere possible?
Yes, I do that too, and got into trouble because one is not that
free to comment code in procmail. Pay attention to the following
example
:0 # comment ok
* condition # OUCH, ouch. This comment must not be here.
# Hm, Old procmail versions don't understand this
# Are you sure you want to put comments inside
# condition line?
* condition
{ # comment ok
# comment ok
:0 # comment ok
/dev/null # comment ok
} # comment ok |
So, the place to watch is the condition line. Later procmail
versions may understand those, but if you intend to share your
recipe, play it safe and think about backward portability.
4.10 Brace placement
Be careful with your braces and remember that old procmail
versions aren't as forgiving as newer versions. Below you see
classical "Test OK condition first, and if that fails then do
something else". See the side comments.
:0
* condition
# No space allowed here!
{} # Wrong, at least _one_ empty space
:0 E
{do_something } # Again mistake, must have surrounding spaces |
4.11 Local lockfile usage
Lock files are only needed when procmail is doing something that
should be serialized, i.e., when only one process at a time should
be doing it.
This generally means that any time you write to a file, you should
have a local lock, preferably based on the name of the file being
written to. Forwarding actions ('!'), and 99% of all filters don't
need lock files. However, if a filter action writes to a file while
filtering, then you may need a lock. Procmail always does kernel
locking when it writes mail to files via simple file actions. So
even if you forgot the lock colon, procmail tries to play safe if
kernel locking has been compiled in.
Beware misplacing the lock colon(:)
:0: a # Ouch! Wrong unless you want a lock file named a
:0 a: # Okay. |
Note that in delivering recipes where you manually write the
content, you must use local lock file with > token, because
procmail can't determine lock by itself. It can only determine
the lock file from the >> token. However, putting a lock
file on a recipe like this is, of course, utterly useless. So
you might as well omit the locking entirely.
# Save last body of message to file mail.body
:0 b: mail.body$LOCKEXT
| cat > mail.body |
- If the command line in the procmail rcfile contains ">",
a name for the local lock file will be implicit, and the second
colon alone is enough.
- If the command doesn't write to a file, or doesn't write to the
same file as anything else (including a matching letter that makes
procmail run the same command) that might run at the same time,
the local lock file is unnecessary.
Watch this too. A nesting block that does not launch a clone
cannot take a local lock file on the recipe that starts the
braces. A nesting block that does launch a clone can. (see the
error)
:0: file$LOCKEXT
{
# error: "procmail: Extraneous local lock file ignored"
# - This lock file will be ignored
# - If the recipes inside the braces try to use file.lck
# as a lock file, then you'll have a deadlock situation.
:0 :
/tmp/tmp.mbx
} |
Let me also explain why the w is so important. Notice, that the
two here are equivalent. The W here is implicit. NOTE: this is
only true on the recipe that opens a nested block. On a recipe with
a program, forward, or delivery action, W' is different from w
is different from missing both.
:0 c: file$LOCKEXT :0 Wc: file$LOCKEXT
{ ... } { ... } |
To quote the comment in source code, "try and protect the user from
his blissful ignorance". The parent will always wait for the cloned
child to exit when a lock file is involved. The only question is
whether or not it should be logged. If you want failure of the
cloned child to be logged, then you should use the w flag, ala:
:0 wc: file$LOCKEXT
{ ... } |
A local lockfile can be used to lock a clone; the parent procmail
will remove it when the clone exits (thus it serves as a global
lock file for the clone). If the braced block does not launch a
clone, asking for a local lock file generates an error.
4.12 Global lockfile
If you want to block everything while the recipe runs, even
during the conditions, use global lock. For example in this
construct the formail which updates the message-id cache
file must be protected with a global lock file.
MID_CACHE_LEN = 8192
MID_CACHE_FILE = $PMSRC/msgid.cache
MID_CACHE_LOCK = $PMSRC/msgid.cache$LOCKEXT
LOCKFILE = $MID_CACHE_LOCK
:0
* ^Message-ID:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE
{
LOG = "dupecheck: discarded $MESSAGEID from $FROM $NL"
:0 # no lockfile !
$DUPLICATE_MBOX
}
LOCKFILE # kill variable |
You cannot use local lockfile as below:
:0 : $MID_CACHE_FILE$LOCKEXT
* ^Message-ID:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE |
because the local lock file named on the flag line will be created
only if the conditions have matched and the action is attempted.
One more note: watch carefully, that there is no : lock when
delivering to DUPLICATE_MBOX because the outer global lock file
already prevents all other procmail instances from executing this
part of the recipe.
4.13 Gee, where do I put all those ! * $ ??
Ahem. I can't tell you exactly what to do or how to write your own
procmail recipes, but I can show you an example. Here is one possible
style for condition line token order:
That won't say much unless you see something to compare with. Here
is one perfectly valid rule, but like the above style.
:0
*$ ^Subject:.*$VAR
*! ^From:.*some
*B ! ?? match-the-string-in-body
*$? $IS_EXIST $FILE
*VARIABLE ?? set |
It might be better to line up things in condition lines. The first
column is reserved for dollar sign, the second for not operator
and so on. The key here is, that it is possible to see at a glance
if I variable expansion dollar in the line (leftmost).
:0
*$ ^Subject:.*$VAR
* ! ^From:.*some
* ! B ?? match-the-string-in-body
*$ ? $IS_EXIST $FILE
* VARIABLE ?? set
| | |
| | |
| | What is matched: (H)eader portion, (B)ody or (HB) both.
| | The (??) associative operator is required.
| |
| Not operator (!) or shell call (?)
|
Variable expansion (important) |
4.14 If you Send an automatic reply, use X-loop header
Do not send automatic reply without checking "! ^FROM_DAEMON"
condition and always include X-Loop header and check its existence
to prevent mail loops
:0
* conditions-for-auto-reply
*$ ! ^$MYXLOOP
* ! ^FROM_DAEMON
| $FORMAIL -A "$MYXLOOP" ...other-headers... |
4.15 Avoid extra shell layer and check command for SHELLMETAS
[dan] It is very important to study your shell command calls and try to
save the overload of the extra layer of shell. It may be extra work
once when you write your rcfile but it saves effort on each piece of
arriving mail. When procmail sees a character from SHELLMETAS, it
runs
# Default SHELLMETAS: &|<>~;?*[
# Default $SHELLFLAGS: -c
% $SHELL $SHELLFLAGS "command -opts args" |
instead of
That is because procmail's ability to invoke other programs does not
include filename globbing ([, *, ?), backgrounding (&), piping
(|), succession (;), nor conditional succession (&&, ||). If it
sees any of those characters (before expanding variables), it hands the
job over to a shell.
Sometimes those characters appear in arguments to a command without
having their shell meta meaning and procmail really could invoke the
command directly without the shell. You can see the distinction in a
verbose log file: if procmail runs the command itself, it logs
Executing "command,-opts,args" |
with a comma between each positional parameter, but if it calls a
shell, the original spacing from the rcfile appears unchanged in
the logfile:
Executing "command -opts args" |
So, if you know you won't be needing shell expansion, wrap your
shell calls with this:
savedMetas = $SHELLMETAS
SHELLMETAS # Kill variable
..command that does not need shell expansion features..
SHELLMETAS = $savedMetas |
4.16 Think what shell commands you use
For every message, procmail launches the processes you have
put into your $HOME/.procmailrc. If you haven't paid
attention to optimization before, now it's serious time to
take a magnifying glass and check every recipe and the
processes in them. When you write you private shell scripts,
the performance hit is not so important, but for mail
delivery, the matter is totally different. First, let's see
some programs and sizes: The following is from one Unix
system, where the binaries include debug and symbol table
code.
131072 /usr/bin/awk
196608 /usr/bin/sort
245760 /usr/bin/grep
262144 /usr/bin/sed
303552 /usr/local/bin/gawk
544768 /usr/contrib/bin/perl [perl 4.36]
822232 /opt/local/bin/perl
text data bss
awk: 72727 + 51316 + 15317 = 139360
sort: 173225 + 18496 + 183076 = 374797
sed: 237248 + 16992 + 56252 = 310492
grep: 221591 + 16176 + 53816 = 291583
perl4: 502220 + 36044 + 65632 = 603896
perl5: 633812 + 69612 + 2385 = 705809
gawk: 160018 + 5264 + 7168 = 172450 |
The binary sizes above are not the typical cases: these are from
another system
4 Sep 28 /usr/local/bin/awk -> gawk
32768 Nov 16 /usr/bin/grep
49152 Nov 16 /usr/bin/sed
114688 Oct 20 /usr/local/contrib/gnu/bin/grep
155648 Nov 16 /usr/bin/awk
155648 Nov 16 /usr/bin/nawk
221184 Nov 16 /usr/bin/gawk
311296 Jan 27 /usr/local/bin/gawk
958464 Nov 2 /usr/local/contrib/bin/perl
1196032 Sep 14 /usr/local/bin/perl |
Stan Ryckman <stanr A T sunspot.tiac.net> wants you to know that:
Comparing byte sizes on disk means nothing here... these
things may or may not have been stripped. Any symbol tables included
in the byte counts you see above won't affect process start-up time.
The size command will give a better handle on what will be needed
in starting a process. The three segments may each have their own
overhead, though, and the relative contributions of those segments
to startup time may well be system-dependent.
Hm. Can we draw some conclusion? Not anything definitive, but at
least something:
- While sed(1) and grep(1) may be bigger than awk(1)
in some systems, this is an exception. They are usually
much smaller. It's more effective to use one awk process
instead of many combined filtering commands.
- Complex commands that would require many processes to be
chained together, like `grep -v | grep | sed' could be
usually accomplished with one awk(1) call. Ask somewhere
how to do it with awk(1) if you don't know the language,
it's quite alike perl(1)
- Try to use standard awk(1). gawk(1) and nawk(1)
are bigger and may not be found on all systems.
- Avoid perl(1) at all costs; it's many times (6) bigger than
awk(1). Perl is slow-to start up, due to intermediate
compilation process at startup and hogs oodles of memory.
- Remember that if procmail is running in a dedicated mail host, it
probably doesn't even have any goodies installed, just the boring
standard versions; which may not be even the same as what you see
on current host.
Here are some more programs. Don't even think of extracting fields with
grep or awk, like "grep Subject", because formail is
much smaller and more optimized for tasks like that. Better yet,
many times you can do all with procmail's regexp matches.
37007 Sep 5 15:53 /usr/local/bin/formail # 3.11pre7
28672 Jun 10 1996 /usr/bin/tr
20480 Jun 10 1996 /usr/bin/tail
20480 Jun 10 1996 /usr/bin/cat
20480 Sep 26 1996 /usr/bin/expr
16384 Jun 10 1996 /usr/bin/head
16384 Jun 10 1996 /usr/bin/cut
16384 Jun 10 1996 /usr/bin/date
16384 Jun 10 1996 /usr/bin/uniq
16384 Jun 10 1996 /usr/bin/wc
12288 Jun 10 1996 /usr/bin/echo |
4.17 Using absolute paths when calling a shell program
Shell programmers know that if absolute path is used for calling
the executable, shell doesn't have to search through long list of
directories in $PATH. This may speed up shell scripts remarkably.
The best way to use such an optimization is to define variables to
those programs.
Should you use such optimization in your procmail code? That is a