Text has been rephrased or modified that does not exist in the original source.
[ed] Edward J. Sabol sabol A T alderaan gsfc nasa gov
[elijah] Eli the Bearded process A T qz little-neck ny us
[hal] Hal Wine hal A T dtor com
[jari] Jari Aalto jari aalto A T poboxes dt com
[philip] Philip Guenther guenther A T gac edu
[richard] Richard Kabel rkabel A T sequent com
[sean] Sean B. Straw PSE-L A T mail professional org
[timothy] Timothy J Luoma luomat+procmail A T luomat peak org
[walter] Walter Dnes waltdnes A T interlog com
[FAQ] Procmail FAQ era A T iki.fi
[manual] Quote from some procmail manual page
[maintainer] As of 2000-09 the maintainer is [jari]
A big thanks goes to all these people:
- 1999-06-16 Mark Seiden <mis A T seiden.com> Did an enormous
work to proofread the v1.74. He sent a massive 105k patch
with many editorial corrections.
- 1999-01-08 Steven Alexander <stevena A T teleport.com>
thought that a small perl script would help me to fix
spelling mistakes more easily. The script has been much
better correction program than anything else.
- 1999 <Guido.Van.Hoecke A T se.bel.alcatel.be> took v1.48
and sent a huge 55k patch to correct many grammar
typos.
- 1998-10-28 Richard Kabel <rkabel A T sequent.com> sent
massive patch to correct language and provided excellent
improvement comments.
- 1998 Era Eriksson proof read the v1.12 and sent numerous
corrections.
- Karl E. Vogel <vogelke A T c17mis.region2.wpafb.af.mil>
sent numerous new anti-spam links to be added to the
document.
- 1998 John Gianni <jjg A T cadence.com> sent nice
recipes: one is now in the procmail module list and the
other ideas I have added to this tips file.
- 1998 Tim Potter <tpot A T zip.com.au> had a spare moment
with v1.27 and sent a patch to correct spelling mistakes.
1.4 Version information
Here is version and file size log of the text file, which gives you
some estimate how the document has evolved.
2008-09-21 510 Links checked.
2008-03-10 510 Add Gmane URL. Links checked.
2007-10-02 519 New HTML/CSS layout. Links checked.
2006-02-15 519 Sanitized all email addresses.
2004-10-10 516 Spam related things removed.
2002-08-31 596 Removed old UBE pointers.
2002-08-13 596 Removed old UBE pointers.
2002-02-01 608 Spelling checked with Emacs ispell
2002-01-28 608 URL links checked and updated
2001-08-09 608 http://pm-doc.sourceforge.net opened.
1999-12-27 603 Netscape spam filters added
1999-10-01 602 Mark Seiden's patch applied. Now under CVS.
1999-04-26 599 document moved to www.procmail.org
1999-04-21 597 Links corrected
1999-03-29 597 Ricochet -- Perl script to fight UBE
1999-02-26 592 procmail's Y2K compliance
1999-02-23 590 RFC and using MIME in Usenet postings
1998-01-29 587 Added "Lua" language pointer
1998-01-07 579 Eli's procmail recipes in module section
1998-12-14 578 Philip took care of bugs/patches listing
1998-11-26 602 More Richard's comments integrated
1998-10-30 595 Richard's english correction patch
1998-10-21 591 UMASK, .forward if procmail already is LDA
1998-10-12 583 SmartList and other MLM software discussed
1998-10-06 575 PLUS addr. Convert HTML body to text
1998-08-29 565 Fetching fields with formail -x
1998-08-24 554 Procmail doesn't pass 8bit characters
1998-08-24 553 Flag c forking study, procmail wish list
1998-08-18 541 Small changes. MIME notes
1998-08-10 529 Guido.Van.Hoeck's 55k patch applied
1998-06-24 526 Added live urls to procmail archive
1998-06-23 521 All recipes checked by eye. Many fixes.
1998-06-19 516 Detecting mailing lists with pm-jalist.rc
1998-06-17 510 How to disable recipe quickly with
1998-04-03 493 Includerc rewritten, plus addressing
1998-04-02 488 ORing and supreme scoring added
1998-03-23 471 All recipes checked (by eye)
1998-03-10 469 Better ordering: ORing rules discussed
1998-01-30 429 "regexp" section rewrite.
1997-12-30 415 up till 1996-12 is now included
1997-12-09 343 up till archive 1996-07 now included
1997-11-25 260
1997-11-08 218 Era's correction suggestions.
1997-10-13 181 archive file 1995-10's tips included
1997-10-11 142
1997-10-01 127
1997-09-18 94
1997-09-16 76
1997-09-14 53
1997-09-13 46 (k) |
1.5 Document layout and maintenance
The base version of this document is kept in plain text
format, which requires no special editors or learning a markup
language. The tools to help maintaining this document include:
Text version of this file is converted to HTML with:
perl -S <conversion program> --Auto-detect --Out pm-tips.txt |
SENDING IMPROVEMENTS
If you have any spare moment, a glimpse to find some spelling
mistakes or misuse of the verbs, please go ahead and send a
patch to maintainer of this page. The preferred way to send
corrections to this document is as diff(1) output. Here's
how to make corrections send them forward. Please try to use
unified diff -u option. The source is available at
http://pm-doc.cvs.sourceforge.net/pm-doc/pm-doc/doc/tips/
cp pm-tips.txt pm-tips.txt.orig
... load the pm-tips.txt to a text editor / edit / save
... Generate the difference
diff -bwu pm-tips.txt.orig pm-tips.txt > pm-tips.txt.patch
...Send content of pm-tips.txt.diff by mail to maintainer |
If you do not know what a diff format is, then simply send
your comments in email. Use "Linux: pm-doc" as subject to
bypass spam filtering.
1.6 About presented recipes
The recipes have been kept as original as possible, but a
generalization of the ideas have been done when necessary. If
some recipe doesn't work as announced, please a) send note to
[maintainer] b) send mail to procmail mailing list and ask how
to correct it. Sometimes a simple dot(.) has been used in
regular expressions, where the right, pedantic way would have
been to use an escaped dot. If you want to be very strict, you
should use the escaped dot where applicable.
# free hand version # pedantic version
:0 :0
* match.this.site * match\.this\.site |
Procmail also accepts assignments without quotes, like this:
var = value
num = 1
dir = /var/mail |
But in this document a strict style has been adopted, where literal
strings are assigned with double quotes:
That's because the procmail code checker (Emacs package
tinyprocmail.el) then won't warn about missing dollar-sign, which
might have very well been forgotten. Emacs package font-lock.el,
a syntax highlighting assistant, also displays double quoted string
in color.
# If you do this...
var = value
# then you might have made a typo. It is in fact not clear
# what was intended:
var = "value" # Did you mean: literal assignment?
var = $value # Did you mean: variable assignment? |
Recipe flags are also not stuck together, because the visual
distinction of :0 and flags is a valuable one. Reasoning for
which flags are kept together and in which order is explained later
in details.
# Erm, all stuck] # This may be visually more clear
:0ABDc: :0 A BD c: |
1.7 Variables used in recipes
These are part of the procmail module pm-javar.rc and are used in
recipes.
# Pure newline; typical usage if you want to write
# Something directly to procmail's active logfile:
#
# LOG = "$NL message $NL"
NL = "
" |
Refer to "improving Space-Tab syndrome" section for more details
WSPC = " " # whitespace: space + tab
SPC = "[$WSPC]" # Regexp: space + tab
SPCL = "($SPC|$)" # whitespace + linefeed: spc/tab/nl
NSPC = "[^$WSPC]" # negation
s = $SPC # shortname: like perl -- \s
d = "[0-9]" # A digit -- Perl \d
w = "[0-9a-z_A-Z]" # A word -- Perl \w
W = "[^0-9a-z_A-Z]" # A word -- Perl \W
a = "[a-zA-Z]" # A word, only alphabetic chars |
Writing recipes is now a little easier and may look more clear at
least to people that have accustomed reading Perl regular expression
short names:
:0
*$ Header-Name:$s+$d+$s+$d # Matches "Header: 11 12"
{
# Matched "whitespace" + "digit" + "whitespace" + "digit"
# Do something
} |
SUPREME = 9876543210, is the highest score value that causes
procmail to bail out. [david] Actually the maximum is 2147483647,
but 9876543210 is easier to remember/type and will function just as
well.
PMSRC = Procmail module source code directory. Location
where *.rc files reside. Anywhere you want it to be. Usually
$HOME/pm or $HOME/procmail/lib. Here you can keep the
procmail files, log files and includerc scripts. Another
common used synonym is PMDIR.
SPOOL = Directory where your procmail delivers the categorized
messages. Like mailing lists:
list.procmail, list.lynx-users, list.emacs, list.elm |
and work mail:
work.announcements, work.lab, work.doc, work.customer |
and your private message:
mail.usenet, mail.private, mail.default, mail.perl |
and unimportant messages
junk.daemon, junk.cron, junk.ube |
If you read the procmail-delivered files directly, this directory
is usually $HOME/Mail or $HOME/mail. If you use some other software
that reads these files as mail spool files (like Emacs Gnus), then
this directory is typically $HOME/Mail/spool or similar.
MYXLOOP = Used to prevent re-sending messages that have already
been handled. Typically $LOGNAME@$HOST, but this can be any user
chosen string. Make it it unique to your address. In this document
the definition is:
MYXLOOP = "X-Loop: $LOGNAME@$HOST" |
SENDMAIL = Program to deliver composed mail. Usually standard
Unix sendmail(1), but it must have some switches with it. See man
page for more. We use following definition in scripts:
SENDMAIL = "sendmail -oi -t" |
NICE = In a Unix environment you can lower the scheduling
priority with nice(1). If you are conscious of how many external
processes you launch for each piece of mail it would be polite to
lower the priority of such processes. You may see in this document
that external processes are called with NICE enabled:
:0 w # Same as "nice -10 script.pl"
| $NICE script.pl |
IS functions; Functions to test file or directory attributes.
E.g. IS_EXIST is defined as "test -e" and so on. The definitions of
IS functions are system-dependent. E.g. On Irix the "-e" option
is not recognized and the nearest equivalent is "test -r". All IS
functions are defined in the pm-javar.rc module.
1.8 About "useless use of cat award"
Randal Schwartz, a well-known Perl programmer and Perl book writer,
started giving rewards for the "useless use of cat command"
whenever someone wrote examples without token "<". Like this:
$ cat file.name.this | wc -l |
Instead he writes that the call should have been written like this,
which saves the pipe (never mind that wc can read the file
directly; this is an example).
[Paul David Fardy <pdf A T morgan.ucs.mun.ca>] There is weight
in the pipeline, but the true cost is in process startup. Try
running wc 100 times on /etc/motd or on this message. My tests show
the useless use of cat doubles the real and processing time (real,
user, and system time are each roughly doubled):
$ cat > /tmp/randal << 'EOF'
COUNT=100
i=1
while :
do
wc < /etc/motd > /dev/null
i=$(expr $i + 1)
[ "$i" = "$COUNT" ] && break
done
EOF
$ cat > /tmp/useless << 'EOF'
COUNT=100
i=1
while :
do
cat /etc/motd | wc > /dev/null
i=$(expr $i + 1)
[ "$i" = "$COUNT" ] && break
done
EOF
# NOTE: The timing values should be read as absolute, but
# examine the relative differencies.
$ time sh /tmp/randall
real 0m0.568s
user 0m0.208s
sys 0m0.348s
$ time sh /tmp/useless
real 0m0.825s
user 0m0.348s
sys 0m0.476s |
This becomes important, for example, when you decide to filter all
your mail with procmail – looking for virus signatures for example.
I might well decide to look only at the first 3 or 4 kilobytes.
It's not the size of messages--most are small anyway – but the
number of messages that cause a problem. Do you want to double the
processing cost of all our mail? I'm looking at a system-wide
filter for all my users' mail. I'm considering Sendmail's mail
filter versus procmail filtering. I'll likely be using a bit of
both. And given that all of the filtering really just getting in
the way of legitimate traffic, it'd really piss me off if I naively
doubled the cost.
2.0 Procmail pointers
2.1 Where is procmail developed
Philip Guenther <guenther A T gac.edu> is currently taking care of and
coordinating procmail bug fixes. Please send any procmail bugs to
the mailing list or to bug@procmail.org. The development mailing
list is running SmarList at procmail-dev@procmail.org.
Newest Procmail code it at <http://www.procmail.org/> and
ready packages are available at Linux distributions' repositories.
2.2 Procmail resources
Procmail is discussed in Usenet newsgroup comp.mail.misc and
mailing list accessible at NNTP server
<http://news.gmane.org/gmane.mail.procmail>.
2.3 Procmail mode for Emacs
If you use Emacs, See
Procmail mode tinypm.el at
<http://freshmeat.net/projects/emacs-tiny-tools>. It can also
be used to statically syntax check recipes. Here is an example
of its output:
*** 1997-11-24 22:13 (pm.lint) 3.11pre7 tinypm.el 1.80
cd /users/jaalto/junk/
pm.lint:010: Warning, no right hand variable found. ([$`']
pm.lint:055: Pedantic, flag orer style is not standard `hW:'
pm.lint:060: Warning, message dropped to folder, you need lock.
pm.lint:062: Warning, recipe with "|" may need `w' flag.
pm.lint:073: Warning, Formail used but no `f' flag found. |
2.4 Procmail module library project
2.4.1 Where to get various modules
- Procmail module library.
The idea of plug-in modules was originally coined by Alan
Stebbens (<alan.stebbens A T software.com>, <alan.stebbens
A T openwave.com>).
2.4.2 Terminology
subroutine/module = A piece of code that gets something in
INPUT and responds with OUTPUT. Subroutine is not message
specific.
recipe = A piece of code that is somewhat self
contained: It reads something from the message or does
something according to matches in message. Recipe may be
message-specific. Recipe is more free-form and does not
follow strict INPUT/OUTPUT methodology.
2.4.3 Foreword to using modules
In the module listing, some of the modules are recipes and
some can be considered subroutines. Let's take the address
exploder. First, visualise following familiar programming
language pseudo code:
(ret-val1, ret-val2 ...) = Function(arg1, arg2, arg3 ...) |
Function may return multiple arguments and multiple
arguments can be passed to it. Clear so far. The concept
applies to procmail modules like this:
RC_FUNCTION = $PMSRC/pm-xxx.rc # name the subroutine/module
RC_FUNCTION2 = ...
INPUT = "value" # Set the arg1 for module
INCLUDERC = $RC_FUNCTION # Call Function( $arg1 )
:0 # Examine function's return value
* ERROR ?? yes
... |
This should be pretty clear too. You just have to look into the
subroutine/module which you intend to use, to find out what
arguments it wants which you need to set (INPUT) before calling
it. The documentation also tells you what values are returned, e.g.
one of them was ERROR.
If it were recipe, the call would be almost the same, but
instead of returning values, the recipe/module most likely does
something to your message or writes something to the data files
etc. A recipe is much higher level, because it may
call multiple subroutines/modules. The distinction between
subroutine and recipe module type is not crystal clear, but I hope
the above will clarify a bit the Procmail module/subroutine/recipe
concept.
2.4.4 Header file modules
These are like #include .h files in C, they define common
variables, but do not contain actual code.
- pm-javar.rc – Defines standard variables: SPC WSPC NSPC SPCL and
perl styled \s \d \D \w \W and \a \A (alphabetic characters only)
- headers.rc – From Alan's procmail-lib. Define standard regexp
and macros: address, from, to, cc, list_precedence
2.4.5 General modules
- pm-jafrom.rc – Derive FROM field without calling formail
unnecessarily. If all else fails, use formail.
- get-from.rc – From Alan's procmail-lib. get the "best" From
address. Sets FROM and FRIENDLY, the latter being the "friendly"
user name sans address.
- pm-jaaddr.rc – Subroutine to extract various mail components
from INPUT. Like address=foo@example.com, net=com, account=foo...
- pm-jastore.rc – Subroutine for general mailbox delivery.
Define MBOX as the folder where to drop
message and this subroutine will store it appropriately.
Supports single mboxes, ".gz" mbox files, directory files and
MH folders with rcvstore.
2.4.6 Spam modules
Read "Thoughts about increasing spam annoyance" at
<http://pm-lib.sourceforge.net/README.html> which
explains these modules better in context "2.0 A lightweight
UBE block system with pure procmail".
- pm-jaube.rc – Subroutine to investigate the message
for know spam pattern like numeric address, invalid address,
Pegasus bulk mail, advertising slogans etc. This is the
generic Spam detection module. Needs only one external
program: nslokup1(1) to verify the sender's domain. The
results of classification appears in returned variables
that the caller can use for deciding what to do. Optional
headers can be added to the message to announce the
results.
- pm-jaube-keywords.rc – Subroutine to scrutinize the
message against known spam keywords. This is the "bare
bones" and very simplistic (but fast) way to check if
message is Spam. The results of classification appears in
returned variables that the caller can use for deciding
what to do.
- pm-jaube-prg-runall.rc – An Interface module to call
external statistical bayesian spam classifier programs.
This subroutine will call other modules, like
pm-jaube-prg-bogofilter.rc (for bogofilter),
pm-jaube-prg-bsfilter.rc (for bsfilter) and many many more
that help fighting spam. It is possible to activate
specific bayesian programs available in current host.
2.4.7 Mime modules
- pm-jamime.rc – Subroutine to read MIME headers and put the
mime version, boundary string, content-type information to
variables.
- pm-jamime-decode.rc – recipe to decode quoted-printable
or base64 encoding in the body.
- pm-jamime-kill.rc – Recipe for attachment killing: wipes out the
extra mime cruft leaving only the plain text. Applications for
killing: ms-tnef attachment (MS Explorer 7k),
HTML attachments (Netscape, MS Express) vcard (Netscape),
PCX attachment (Lotus Notes).
- pm-jamime-save.rc – Recipe for saving simple file attachment.
When you receive ONE file attachment in a message, this
recipe can save it in a separate directory. The content is
also decoded (base64,qp) while saving.
2.4.8 Filtering message body or headers
- pm-jadaemon.rc – Handle DAEMON messages by changing subject to
reflect a) the error reason b) to whom the message was originally
sent c) original subject sent and what was the subject. Store the
DAEMON messages to separate folder.
- pm-jasubject.rc – Standardize Subject "Re32: FW: Sv: message"
or any other derivate to de facto "Re: message"
- pm-janetmind.rc – [obsolete]
Reformat minder.netmind.com messages (no longer exists 2005).
The default 4k message is shortened to a few important lines.
2.4.9 Mailing list modules
- pm-jalist.rc – Subroutine to extract mailing list name from
message. Do you need to add a new recipe to your .procmailrc
every time you subscribe to new mailing list? If you do,
take a look at this module, which examines the message and
defines variable LIST to hold the mailing list name. You
can use it directly to save the messages adaptively to
correct folders. No more hand work and manual storing
of mailing list messages.
2.4.10 Miscellaneous modules
- pm-jaempty.rc – check if message body is empty (nothing
relevant). Define variable BODY_EMPTY to "yes" or "no" if
message is empty.
- pm-janslookup.rc – Run nslookup on given address. If you
compose return address with "formail -rt -x To:" you can
verify if domain is registered before sending reply. Uses cache
for already looked up domains. This module is alos used
by the pm-jaube.rc to verify the sender's domain.
- guess-mua.rc – Guess the Mail User Agent and set MUA:
MH,PINE,MAIL
2.4.11 Low-level Date and time handling
For these, you get the date string from somewhere, then feed
it to some of these subroutines:
- pm-jatime.rc – a low-level subroutine. Parse time "hh:mm:ss"
from variable INPUT
- pm-jadate1.rc – a low-level subroutine. Parse date
"Tue, 31 Dec 1997 19:32:57" from variable INPUT
- pm-jadate2.rc – a low-level subroutine. Parse ISO standard date
"1997-11-01 19:32:57" from variable INPUT
- pm-jadate3.rc – a low-level subroutine. Parse date
Tue Nov 25 19:32:57 from variable INPUT
- pm-jadate4.rc – Call shell command "date" once to construct RFC
"Tue, 31 Dec 1997 19:32:57" and parse the YY MM HH and other
values. You usually use this subroutine if you can't get the date
anywhere else.
2.4.12 Higher-level Date and time handling
You use these recipes to get the date directly from the message:
- pm-jadate.rc – higher-level recipe. Read date from message's
headers: From_ Received, or call shell date if none succeeds.
- date.rc – higher-level recipe.
From Alan's procmail-lib: parse date or from headers
Resent-Date:, Date, and From
2.4.13 Forwarding and account modules
- pm-japop3.rc – Pop3 movemail implemented with procmail. You can
send a "pop3" request to move your messages from account X to
account Y. Each message is send separately. This recipe listens
to "pop3" requests.
- pm-jafwd.rc – control forwarding remotely. You can change the
forward address with a "control message" or turn
forwarding on/off with a "control message"
- pm-japing.rc – Send short reply when subject contains the word
"ping" to show that the account is up and mail address is
valid.
- correct-addr.rc – From alan's procmail lib. To help forward mail
from an OLD address to a NEW address, and do some mailing list
mail management. This recipe file is intended to make it easy
for users to forward their mail from their old address to a new
address, and, at the same time, educate their correspondents
about it by CC'ing them with the mail.
2.4.14 Vacation modules
- pm-javac.rc – A framework for your vacation replies. This
recipe will handle the vacation cache and compose an initial
reply; which you only need to fill in. (Like putting vacation
message to the body)
- ackmail.rc – From Alan's procmail lib. procmail rc to
acknowledge mail (with either a vacation message, or an
acknowledgment)
2.4.15 Message-id based modules
- pm-jadup.rc – Handle duplicate messages by Message-Id.
Store duplicate message in separate folder.
- dupcheck.rc – From Alan's procmail-lib. If the current mail has
a "Message-Id:" header, run the mail through "formail -D",
causing duplicate messages to be dropped. Can use MD5 hash in
cache.
2.4.16 Cron modules
- pm-jacron.rc – A framework for your daily cron tasks. This
recipe contains all the needed checks to ensure that your
includerc is called whenever a day changes. (Day change is
subject to messages you receive). Your own cron includerc is
run once a day.
2.4.17 Backup modules
- pm-jabup.rc – Save messages to backup directory and keep only N
messages per day. Idea by John Gianni. Note:
The implementation will always call shell for each message you
receive; so using this module is not recommended if you get
many messages per day. Instead, use the cron module to clean
the messages' backup directory only once a day, and not every time
a message arrives.
2.4.18 Confirmation modules
- pm-jacookie.rc – Handle cookie (unique id) confirmations.
Also known as Procmail authentication service (PAS). This
simple procmail module will accept messages only from
users who have returned a "cookie" key. You can use this
to to protect some services before access. Uses subroutine
pm-jacookie1.rc, which generates the unique cookie; CRC 32
by default. NOTE: Please read page
<http://pm-lib.sf.net/README.html> before you may start
thinking to use this module as a generic Challenge-Response
module to reduce spam.
2.5 Procmail code to filter UBE
Sysadms remember : Spam filtering is much more
efficiently done in the MTA, especially if you are just
looking at From and To lines. For example, you can setup in
Exim a rule that blocks \d.*@aol\.com (that is any aol.com
local part that begins with a digit). AOL guarantees that
none of their addresses begin with a digit. Exim rejects
such bogus addresses at the SMTP level before the message is
received.
- pm-jaube.rc -Procmail module library's UBE filter
After Daniel Smith
posted his spam recipes to procmail mailing list, the code
was adopted and more generalized to handle lot more UBE.
Module needs no special setup and can be installed via
simple INCLUDERC. All UBE detection happens using procmail
rules with no external files needed. The module is
available in Procmail module library at
<http://freshmeat.net/projects/procmail-lib>.
2.5.1 o Catherine A. Hampton's Spambouncer".
The attached set of procmail recipes/filters, which I
call The Spam Bouncer, are for users who are sick of spam
(unsolicited junk mail) and want to filter it out of their
mail as easily as possible. These recipes can be used as
shared recipes for a whole system, or by an individual for
their own mailbox only.
- Junkfilter.
by Gregory Sutter. Junkfilter is a user-configurable
procmail-based filter system for electronic mail. Recipes
include checks for forged headers, key words, common spam
domains, relay servers and many others.
- Nonplussed Spambouncer
Procmail module for bouncing spam. Requires sendmail with
plussed users.
3.0 Dry run testing
3.1 What is dry run testing?
It means that you call your procmail test script directly with sample
test mail
% procmail $HOME/pm/pm-test.rc < $HOME/tmp/test-mail.txt |
The script pm-test.rc has the procmail recipe you're testing
or improving. The test-mail.txt is any valid mail message
containing the headers and body. You can make one with any
text editor, e.g. vi, pico, nano, emacs or xemacs.
Here's a simple test mail skeleton. Copy verbatim:
From: me@example.com
To: me@example.com (self test)
X-info: I'm just testing
BODY OF MESSAGE SEPARATED BY EMPTY LINE
txt txt txt txt txt txt txt txt txt txt |
Remember that you can define environment variables as well in
the dry run call. Here's an example where procmail just executes
the script and does nothing fancy.
% procmail VERBOSE=on DEFAULT=/dev/null \
~/pm/pm-test.rc < ~/txt/test-mail.txt |
Suppose the script prints something to log files, but you'd instead
like to get it all dumped to screen. No problem, first find out
your tty value by calling tty at shell prompt and pass
that on the command line. Here the default LOGFILE is directed
to take care of redirecting "LOG=" commands and statement:
# `tty' tells what to fill in /dev/..
% procmail VERBOSE=on DEFAULT=/dev/null \
LOGFILE=/dev/pts/0 \
~/pm/pm-test.rc < ~/txt/test-mail.txt |
3.2 Why the From field is not okay after dry run?
Why it now says "From foo@bar Mon Sep 8 14:38:06 1997"?
Don't worry about this. It's a side-effect of running the
message through formail after having generated any auto-reply
– the auto-reply generated by "formail -rt" doesn't have a
"From " header (it's pointless for outgoing messages), so the
second formail adds one, not knowing that it'll just be
ignored by sendmail later (well, sendmail will extract the
date from it, but that's ignorable). You only see it because
you're saving to a folder instead of the mailing it.
3.3 Getting default value of a procmail variable
There's always this way to learn a variable's initial value
(note the strong quotes), which Stephen uses to get procmail's
value for $SENDMAIL in the scripts that build SmartList:
procmail LOG='$PATH' DEFAULT=/dev/null /dev/null < /dev/null |
Since LOGFILE hasn't been defined, $PATH will be printed to the
screen. One caution: if there are any variables in the definition
of $PATH (such as $HOME), they'll be expanded in the output.
4.0 Things to remember
4.1 Get the newest procmail
Lot of troubles surface only because you have an old
procmail version. Be sure to have the latest. Knock your sysadm or
ISP until he installs this version and don't give up, if you're
serious about using procmail. Here is a command to check your
procmail version number:
4.2 Csh's tilde is not supported
Many shell users have accustomed to using tilde (~)
everywhere. Unfortunately procmail doesn't expand that to home
directories; just use $HOME. When you write procmail
recipes, think sh not bash. This mind set will
automatically get your brain tuned to the right programming
habits.
4.3 Be sure to write the recipe starting right
The recipe starts with :0 or just with : but the latter
one is somewhat dangerous and easy to miss. Beware writing it
0: as it happens easily. Always put a zero after the colon
that begins the recipe. In the first versions of procmail, you
would put the number of conditions, with a default of 1. That
was annoying, and the computer can do the counting easier, so
Stephen made it so that a count of 0 indicates that the
conditions are all the lines beginning with a *. The default
is one, unless the a, A , e, or E flags is given, in
which case the default is zero. ALWAYS START a RECIPE
WITH :0.
4.4 Always set SHELL
If your login shell is a C shell (csh or tcsh), avoid havoc:
as a precaution, always put following at the top of your
$HOME/.procmailrc.
4.4.1 If system has no /bin/sh and you're forced to use csh/tcsh
[<kuhlmav A T elec.canterbury.ac.nz>] Csh and tcsh execute the
.cshrc first, THEN if, and only if it is the login shell (not
a sub shell) it executes the .login, which should contain
basic important system setting like stty commands. Likewise,
bash and ksh users are taught to define and export PATH in
profile, so our per-shell startup files would not have
clobbered the PATH set in .procmailrc the way your .cshrc did.
[philip] ...I have been told by other sysadmins that there are
systems on which csh was hacked to source the .login before the
cshrc. For various reasons I suspect these to be systems based on
older versions of BSD (say, 2.3 BSD).
As for tcsh, the order in which the .login and .cshrc is sourced is
a compile-time option which defaults to the .cshrc (or .tcshrc)
before the .login. There may be some wackos out there who change
the default in memory of the system(s) that they were raised on. I
suggest electroshock as the proper treatment.
...done sys admin on Crays, Convexes, Suns, SGIs, Decs, PC
running BSDI, Linux and Free BSD, and I have never run into a
system where the .cshrc is sourced AFTER the .login. If someone
goes to the trouble to change the order, I would love to know a
valid reason for it.
4.4.2 Procmail won't work well with SHELL set to csh derivate
[1998-08-17 PM-L <kuhlmav A T elec.canterbury.ac.nz> Volker Kuhlmann]
...The blame lies with procmail and its documentation. Obviously,
procmail is programmed with the assumption that the login shell is
a sh derivative. This assumption is a) not very nice, and b) not
stated in the otherwise very good documentation. Of course a user
can set SHELL to tcsh. If then procmail is too stupid to hack it,
it ought to say so clearly, and the above-mentioned questions of
people using tcsh will disappear from this list. One could also be
nice and point out pitfall (3) mentioned above in the procmail
docs. It is customary to have terminal configuration in .login. If
it is shifted to .cshrc it should be properly surrounded by if ..
endif. Perhaps it is not customary to configure the terminal in
bashrc (where else then? - only a rhetorical question), but that
is no reason to blame it on tcsh.
My .cshrc only setenvs the environment when it is a login shell
(shell level 1). Obviously procmail runs a login shell. As I said
earlier, there are good reasons for setting a full PATH
independently whether the shell is interactive or not. So, when
procmail executes programs with SHELL=tcsh, PATH is set to the tcsh
defaults. That may or may not be desirable, depending on the
individual case. No problem with that and avoidable (run tcsh with
-f). Nice if it was in the procmail docs.
But then, the PATH getting clobbered is not the point here (just a
side-effect I didn't realize until 2 people pointed it out).
4.5 Check and set PATH
It is very likely that the default PATH environment variable
that your $HOME/.procmailrc sees it not enough. To play
safe, so that all the needed binaries can be found when
escaping to shell in .procmailrc, set the PATH variable as a
very first statement. Adding paths that don't exist in another
system but does exists in the other makes it possible to use
the same $HOME/.procmail on multiple servers (Like HP, SUN,
IBM, Linux)
PATH = \
$HOME/bin:\
/usr/local/gnu/bin:\
/usr/contrib/bin:\
/usr/local/bin:\
/opt/local/bin:\
/bin:\
/usr/bin:\
/usr/lib:\
/usr/ucb:\
/usr/sbin:\
/vol/bin:\
/vol/lib:\
/vol/local/bin:\
${PATH} |
4.6 Keep the log on all the time
It's best that you put these variables at the very start of
your .procmailrc. When you start using procmail, you also
want to know all the time what's happening there and why your
recipes didn't work as expected. The answer to almost all your
questions can be found in the log file. As the log file will
grow to be quite big, remember to set up a cron job to keep it
moderate size.
LOGFILE = $PMSRC/pm.log
LOGABSTRACT = "all"
VERBOSE = "on" |
4.7 Never add a trailing slash for directories
Drop the trailing slash: it'll choke if you ever end up on
Apollo's DomainOS where double slashes are network references.
If the directory has a trailing slash, it will choke on most
OSes (they treat it like "/.").
DIR = /full/path/to/www/directory/ # Wait...
FILE = $ARCHIVEDIR/file # Ouch ! |
4.8 Remember what term DELIVERED means
When procmail delivers a piece of mail, whether to a file or a
pipe-command, if the write succeeds, then the mail is
considered to have been delivered, and processing stops with
that recipe file. Here is the relevant text from man page:
...There are two kinds of recipes: delivering and non-delivering
recipes. If a delivering recipe is found to match, procmail
considers the mail (you guessed it) delivered and will cease
processing the rcfile after having successfully executed the
action line of the recipe. If a non-delivering recipe is found to
match, processing of the rcfile will continue after the action
line of this recipe has been executed.
4.9 Beware putting comment in wrong places
You like commenting a lot, sticking them everywhere possible?
Yes, I do that too, and got into trouble because one is not that
free to comment code in procmail. Pay attention to the following
example
:0 # comment ok
* condition # OUCH, ouch. This comment must not be here.
# Hm, Old procmail versions don't understand this
# Are you sure you want to put comments inside
# condition line?
* condition
{ # comment ok
# comment ok
:0 # comment ok
/dev/null # comment ok
} # comment ok |
So, the place to watch is the condition line. Later procmail
versions may understand those, but if you intend to share your
recipe, play it safe and think about backward portability.
4.10 Brace placement
Be careful with your braces and remember that old procmail
versions aren't as forgiving as newer versions. Below you see
classical "Test OK condition first, and if that fails then do
something else". See the side comments.
:0
* condition
# No space allowed here!
{} # Wrong, at least _one_ empty space
:0 E
{do_something } # Again mistake, must have surrounding spaces |
4.11 Local lockfile usage
Lock files are only needed when procmail is doing something that
should be serialized, i.e., when only one process at a time should
be doing it.
This generally means that any time you write to a file, you should
have a local lock, preferably based on the name of the file being
written to. Forwarding actions ('!'), and 99% of all filters don't
need lock files. However, if a filter action writes to a file while
filtering, then you may need a lock. Procmail always does kernel
locking when it writes mail to files via simple file actions. So
even if you forgot the lock colon, procmail tries to play safe if
kernel locking has been compiled in.
Beware misplacing the lock colon(:)
:0: a # Ouch! Wrong unless you want a lock file named a
:0 a: # Okay. |
Note that in delivering recipes where you manually write the
content, you must use local lock file with > token, because
procmail can't determine lock by itself. It can only determine
the lock file from the >> token. However, putting a lock
file on a recipe like this is, of course, utterly useless. So
you might as well omit the locking entirely.
# Save last body of message to file mail.body
:0 b: mail.body$LOCKEXT
| cat > mail.body |
- If the command line in the procmail rcfile contains ">",
a name for the local lock file will be implicit, and the second
colon alone is enough.
- If the command doesn't write to a file, or doesn't write to the
same file as anything else (including a matching letter that makes
procmail run the same command) that might run at the same time,
the local lock file is unnecessary.
Watch this too. A nesting block that does not launch a clone
cannot take a local lock file on the recipe that starts the
braces. A nesting block that does launch a clone can. (see the
error)
:0: file$LOCKEXT
{
# error: "procmail: Extraneous local lock file ignored"
# - This lock file will be ignored
# - If the recipes inside the braces try to use file.lck
# as a lock file, then you'll have a deadlock situation.
:0 :
/tmp/tmp.mbx
} |
Let me also explain why the w is so important. Notice, that the
two here are equivalent. The W here is implicit. NOTE: this is
only true on the recipe that opens a nested block. On a recipe with
a program, forward, or delivery action, W' is different from w
is different from missing both.
:0 c: file$LOCKEXT :0 Wc: file$LOCKEXT
{ ... } { ... } |
To quote the comment in source code, "try and protect the user from
his blissful ignorance". The parent will always wait for the cloned
child to exit when a lock file is involved. The only question is
whether or not it should be logged. If you want failure of the
cloned child to be logged, then you should use the w flag, ala:
:0 wc: file$LOCKEXT
{ ... } |
A local lockfile can be used to lock a clone; the parent procmail
will remove it when the clone exits (thus it serves as a global
lock file for the clone). If the braced block does not launch a
clone, asking for a local lock file generates an error.
4.12 Global lockfile
If you want to block everything while the recipe runs, even
during the conditions, use global lock. For example in this
construct the formail which updates the message-id cache
file must be protected with a global lock file.
MID_CACHE_LEN = 8192
MID_CACHE_FILE = $PMSRC/msgid.cache
MID_CACHE_LOCK = $PMSRC/msgid.cache$LOCKEXT
LOCKFILE = $MID_CACHE_LOCK
:0
* ^Message-ID:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE
{
LOG = "dupecheck: discarded $MESSAGEID from $FROM $NL"
:0 # no lockfile !
$DUPLICATE_MBOX
}
LOCKFILE # kill variable |
You cannot use local lockfile as below:
:0 : $MID_CACHE_FILE$LOCKEXT
* ^Message-ID:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE |
because the local lock file named on the flag line will be created
only if the conditions have matched and the action is attempted.
One more note: watch carefully, that there is no : lock when
delivering to DUPLICATE_MBOX because the outer global lock file
already prevents all other procmail instances from executing this
part of the recipe.
4.13 Gee, where do I put all those ! * $ ??
Ahem. I can't tell you exactly what to do or how to write your own
procmail recipes, but I can show you an example. Here is one possible
style for condition line token order:
That won't say much unless you see something to compare with. Here
is one perfectly valid rule, but like the above style.
:0
*$ ^Subject:.*$VAR
*! ^From:.*some
*B ! ?? match-the-string-in-body
*$? $IS_EXIST $FILE
*VARIABLE ?? set |
It might be better to line up things in condition lines. The first
column is reserved for dollar sign, the second for not operator
and so on. The key here is, that it is possible to see at a glance
if I variable expansion dollar in the line (leftmost).
:0
*$ ^Subject:.*$VAR
* ! ^From:.*some
* ! B ?? match-the-string-in-body
*$ ? $IS_EXIST $FILE
* VARIABLE ?? set
| | |
| | |
| | What is matched: (H)eader portion, (B)ody or (HB) both.
| | The (??) associative operator is required.
| |
| Not operator (!) or shell call (?)
|
Variable expansion (important) |
4.14 If you Send an automatic reply, use X-loop header
Do not send automatic reply without checking "! ^FROM_DAEMON"
condition and always include X-Loop header and check its existence
to prevent mail loops
:0
* conditions-for-auto-reply
*$ ! ^$MYXLOOP
* ! ^FROM_DAEMON
| $FORMAIL -A "$MYXLOOP" ...other-headers... |
4.15 Avoid extra shell layer and check command for SHELLMETAS
[dan] It is very important to study your shell command calls and try to
save the overload of the extra layer of shell. It may be extra work
once when you write your rcfile but it saves effort on each piece of
arriving mail. When procmail sees a character from SHELLMETAS, it
runs
# Default SHELLMETAS: &|<>~;?*[
# Default $SHELLFLAGS: -c
% $SHELL $SHELLFLAGS "command -opts args" |
instead of
That is because procmail's ability to invoke other programs does not
include filename globbing ([, *, ?), backgrounding (&), piping
(|), succession (;), nor conditional succession (&&, ||). If it
sees any of those characters (before expanding variables), it hands the
job over to a shell.
Sometimes those characters appear in arguments to a command without
having their shell meta meaning and procmail really could invoke the
command directly without the shell. You can see the distinction in a
verbose log file: if procmail runs the command itself, it logs
Executing "command,-opts,args" |
with a comma between each positional parameter, but if it calls a
shell, the original spacing from the rcfile appears unchanged in
the logfile:
Executing "command -opts args" |
So, if you know you won't be needing shell expansion, wrap your
shell calls with this:
savedMetas = $SHELLMETAS
SHELLMETAS # Kill variable
..command that does not need shell expansion features..
SHELLMETAS = $savedMetas |
4.16 Think what shell commands you use
For every message, procmail launches the processes you have
put into your $HOME/.procmailrc. If you haven't paid
attention to optimization before, now it's serious time to
take a magnifying glass and check every recipe and the
processes in them. When you write you private shell scripts,
the performance hit is not so important, but for mail
delivery, the matter is totally different. First, let's see
some programs and sizes: The following is from one Unix
system, where the binaries include debug and symbol table
code.
131072 /usr/bin/awk
196608 /usr/bin/sort
245760 /usr/bin/grep
262144 /usr/bin/sed
303552 /usr/local/bin/gawk
544768 /usr/contrib/bin/perl [perl 4.36]
822232 /opt/local/bin/perl
text data bss
awk: 72727 + 51316 + 15317 = 139360
sort: 173225 + 18496 + 183076 = 374797
sed: 237248 + 16992 + 56252 = 310492
grep: 221591 + 16176 + 53816 = 291583
perl4: 502220 + 36044 + 65632 = 603896
perl5: 633812 + 69612 + 2385 = 705809
gawk: 160018 + 5264 + 7168 = 172450 |
The binary sizes above are not the typical cases: these are from
another system
4 Sep 28 /usr/local/bin/awk -> gawk
32768 Nov 16 /usr/bin/grep
49152 Nov 16 /usr/bin/sed
114688 Oct 20 /usr/local/contrib/gnu/bin/grep
155648 Nov 16 /usr/bin/awk
155648 Nov 16 /usr/bin/nawk
221184 Nov 16 /usr/bin/gawk
311296 Jan 27 /usr/local/bin/gawk
958464 Nov 2 /usr/local/contrib/bin/perl
1196032 Sep 14 /usr/local/bin/perl |
Stan Ryckman <stanr A T sunspot.tiac.net> wants you to know that:
Comparing byte sizes on disk means nothing here... these
things may or may not have been stripped. Any symbol tables included
in the byte counts you see above won't affect process start-up time.
The size command will give a better handle on what will be needed
in starting a process. The three segments may each have their own
overhead, though, and the relative contributions of those segments
to startup time may well be system-dependent.
Hm. Can we draw some conclusion? Not anything definitive, but at
least something:
- While sed(1) and grep(1) may be bigger than awk(1)
in some systems, this is an exception. They are usually
much smaller. It's more effective to use one awk process
instead of many combined filtering commands.
- Complex commands that would require many processes to be
chained together, like `grep -v | grep | sed' could be
usually accomplished with one awk(1) call. Ask somewhere
how to do it with awk(1) if you don't know the language,
it's quite alike perl(1)
- Try to use standard awk(1). gawk(1) and nawk(1)
are bigger and may not be found on all systems.
- Avoid perl(1) at all costs; it's many times (6) bigger than
awk(1). Perl is slow-to start up, due to intermediate
compilation process at startup and hogs oodles of memory.
- Remember that if procmail is running in a dedicated mail host, it
probably doesn't even have any goodies installed, just the boring
standard versions; which may not be even the same as what you see
on current host.
Here are some more programs. Don't even think of extracting fields with
grep or awk, like "grep Subject", because formail is
much smaller and more optimized for tasks like that. Better yet,
many times you can do all with procmail's regexp matches.
37007 Sep 5 15:53 /usr/local/bin/formail # 3.11pre7
28672 Jun 10 1996 /usr/bin/tr
20480 Jun 10 1996 /usr/bin/tail
20480 Jun 10 1996 /usr/bin/cat
20480 Sep 26 1996 /usr/bin/expr
16384 Jun 10 1996 /usr/bin/head
16384 Jun 10 1996 /usr/bin/cut
16384 Jun 10 1996 /usr/bin/date
16384 Jun 10 1996 /usr/bin/uniq
16384 Jun 10 1996 /usr/bin/wc
12288 Jun 10 1996 /usr/bin/echo |
4.17 Using absolute paths when calling a shell program
Shell programmers know that if absolute path is used for calling
the executable, shell doesn't have to search through long list of
directories in $PATH. This may speed up shell scripts remarkably.
The best way to use such an optimization is to define variables to
those programs.
Should you use such optimization in your procmail code? That is a
two folded question. Examine how many shell calls do you use? Do
you use grep or formail a lot? Then you could optimize these
calls. To be portable, define variables for executables:
# perhaps defined in separate INCLUDERC
#
# INCLUDERC = $PMSRC/pm-mydefaults.rc
FORMAIL = /usr/local/bin/formail
GREP = /bin/grep
DATE = /bin/date
:0 fhw
| $FORMAIL -rt |
When you port your .procmailrc to different environment which
has different paths, you could use this recipe in addition to one
just mentioned above:
FORMAIL = ...as above
:0
* HOST ?? second-host
{
# In this host the paths are different. Reset.
$FORMAIL = "formail"
$GREP = "grep"
$DATE = "date"
} |
4.18 Disabling a recipe temporarily
If you have a recipe that you would like to disable for a while,
there is an easy way. Just add the "false" condition line before
any other conditions. The "!" also nicely visually flags that
"this recipe is NOT used".
# This recipe stops at "!" and doesn't get past it.
:0
* !
* condition
* condition
{
...
} |
4.19 Keep message backup, no matter what
It's good to have a safety measure in your .procmailrc.
Although you are an expert and have checked your recipes 10 times,
there is still a chance that something breaks. One morning, when you
browse your BIFF reminder log; you notice "Hm, there is that
interesting message but it was not filed, where is it?". And when
you go to study the procmail logs (you do keep the log going all
the time) and it hits you: "Gosh; a mistake in my script! Message was
fed to malicious pipe and I had that i flag there... sniff".
And you greatly regret you didn't back up the message in the first
place.
So, before your procmail does anything to your message, put the
message into some folder which is regularly expired. Emacs Gnus can
do mailbox's expiring, but one could also use a cron(1) to do the
cleaning. After that, you can relax knowing your mail is safe.
# Your incoming messages are stored here, filtered by procmail
SPOOL = $HOME/Mail/spool
# Backup storage
#
# - This could be directory too. In that case you could use
# cron job to expire old messages at regular intervals
# - For once a day expiration, see procmail module list
# and pm-jacron.rc
BUP_SPOOL = $SPOOL/junk.bup.spool
:0 c:
$BUP_SPOOL |
Naturally you can filter out mailing list messages from the backup,
because losing one or two (hundred) of them may not be that serious.
Maybe you could use two backup spools, one for mailing lists and the
other for your non-list messages.
:0 c:
* ! mailing-list1|mailing-list2
$BUP_SPOOL |
If you have the date variables set up as described below, you
could also create a backup folder per day:
$BUP_SPOOL = $SPOOL/junk.bup.$YYYY-$MM-$DD.spool |
This makes it very easy to delete backups that are older than
a given number of days, either manually or through a cron job.
4.20 Order of the procmail recipes
When you start writing a lot of procmail recipes, you soon find out
that it matters a great deal in which order your put your recipes.
When each group of recipes starts growing too big, it's good
practice to move each group to a separate includerc file. Here is
one recommended order in which your calls appear in the mail
$HOME/.procmailrc:
- backup important messages
- cron-subroutine
- handle duplicate messages
- handle DAEMON MESSAGES
- handle plus addressed message (RFC plus or sendmail plus addresses)
- handle server requests (file server, ping responder...)
- drop MAILING LIST messages
- send possible vacation replies only after all above
- apply kill file
- detect mime and format or modify the message body
- save private messages
The backup, cron and duplicate handling go naturally to the
beginning of your .procmailrc. Next comes a grey area where
Daemon, plus handling and server messages can be put.
Mailing lists should be handled as early as possible, but after the
server messages, because you want your services handled first.
Do not send vacation replies before you have handled mailing lists
to prevent annoying vacation replies to mailing lists.
After that you are left with "known" private messages and those of
unknown origin. A kill file (to block based on sender) for rapid
spammers, who send you message or several per day may need to be
checked before checking other messages.
Last but not least: Put your UBE checkers to the end to avoid
mishits of valid mail. DO NOT SEND AUTOMATIC COMPLAINT
BACK, or you'll get grey hairs when the autoresponder send its
complaint to valid source. You don't want to answer back with "My
apologies, the script had an error, it won't happen agin" to all
the valid angry mails that are now addressed to you.
Drop the UBE to a folder, manually select the messages that need
actions and send message to postmasters in the Received chain
explaining that their mail relay has been hijacked.
5.0 Procmail flags
5.1 The order of the flags
The order of the flags does not matter in practice, but here is one
stylistic suggestion. The idea here is that the most important
flags are put to the left, like giving priority 1 for aAeE, which
affect the recipe immediately. Priority 2 is given to flag
f, which tells if a recipe filters something. Also (h)eader and
(b)ody should immediately follow f, this is considered priority
3. In the middle there are other flags, and last flag is c, which
ends the recipe, or allows it to continue. In addition according to
[david]: "...I'm quite sure that putting anything other than the
opening colon and the number to the left of AaEe will cause an
error."
:0 aAeE HBD fhb wWir c: LOCKFILE
| | | | |
| | | | (c)ontinue or (c)lone flag last.
| | | (w)ait and other flags
| | (f)ilter flag and to filter what: (h)ead or (b)ody
| (H)eader and (B)ody match, possibly case sensitive (D)
| Note: Procmail 3.22 bug
| <http://mailman.rwth-aachen.de/pipermail/procmail/2002-February/008355.html>
The `process' flags first. (A)nd or (E)lse recipe |
You can write the flags side by side
Or, as suggested, leave flags in their own slot for more
distinctive separation. Note that procmail variable $LOCKEXT must
be next to $MYLOCK, because it contains string ".lock".
:0 A fhw: $MYLOCK$LOCKEXT |
5.2 Flags HB at top of recipe (warning)
[Philip] Version 3.22 has a bug that keeps the 'H' flag from
being cleared, such that once you use it, it never gets
cleared. Using the 'H' flag will therefore cause problems with
latter recipes that use just the 'B' but not the 'H' flag.
Either way, the only time you should use the 'H' flag is on
recipes that needs to match against both the header and the
body. If you want a recipe to match only against the body and
you're using 3.22, use the "B ??" modifier on the conditions.
See Procmail-L message titled
"Cannot get recipies to work properly".
So to be most pportable possible, convert all previously used
condition lines from:
to use this format:
:0
* B ?? body-check-here |
5.3 Flag w and recipe with pipe(|)
[alan] If the filter program exits with a 0 status (0 == okay), then
procmail will replace the original input body with the output of the
filter program. If the filter program exits with anything but zero,
procmail will report an "error" to the log, and "recover" the input
(not filter it)
[david] I am very sure that that's the case only if you have the
w or W flag on the filtering recipe. Without w or W,
procmail won't care about a bad exit status from the filter and will
replace the filtered portion with whatever standard output the
filter produced. It may still report an error to the log but it
won't recover the previous text. This, for example, will destroy the
body of a message, even without i:
With this, however, procmail will recover the original body:
:0 fbW # same results even if we add `i'
| false |
[stephen] No, not on all occasions. Procmail will not care about the
exit code here. However, if procmail detects a write error, it will
recover (because of the missing i flag). Procmail will only detect
a write error in such a case if the mail is long enough and does not
fit in the pipe buffer that's in the kernel (typically 10KB).
5.4 Flag w, lock file and recipe with pipe(|)
[manual] In order to make sure the lock file is not removed until the
pipe has finished, you have to specify option w otherwise the
lock file would be removed as soon as the pipe has accepted the
mail. So if you see anything that looks like ">" or ">" in your
recipe, then that should immediately ring your bells. immediately
check that you have included the w flag and the lock file :.
:0 hwc: headc$LOCKEXT
* !^FROM_MAILER
| uncompress headc.Z; cat >> headc; compress headc |
5.5 Flag f and w together
The w tells Procmail to hang around and wait for the script to
finish. Hm, Wouldn't you think this ought to be implied by the f
flag already?
[david] Of course the f flag is enough to make procmail wait for
the filter to finish, but the w means something more: to wait to
learn the exit code of the filtering command. If sed fails with a
syntax error and gives no output, without W or w procmail would
happily accept the null output as the results of the filter and
go on reading recipes for the now body-less message. On the other
hand, with W or w sed will respond to a non-zero exit code by
recovering the unfiltered text.
5.6 Flags h and b
[david] hb is the default; you need to use h only when you
don't want b or vice versa. You can think of it this way: h
means "lose the body" and b means "lose the header," but the two
together cancel each other out.
[philip] hb (feeding whole message) is the default for actions.
You need to specify h without b if you want the action applied
only to the head. H is the default for conditions. You need to
specify HB or BH if you want to test a condition against the
entire message.
5.7 Flag h and sinking to /dev/null
When you drop something to /dev/null, use the h flag so that
procmail does not unnecessarily try to feed whole message there.
:0 h
* condition
/dev/null |
[philip] Procmail knows that it shouldn't create a local lock on
/dev/null and that it shouldn't kernel lock /dev/null, and it knows
to write it "raw" (no "From " escaping or appended newline). This
means that procmail simply opens /dev/null, does its write with one
system call, and closes it. I'm not sure if adding the h flag
makes a real difference on modern UNIX kernels. I suppose it
depends on how optimized the write() data is and in particular,
whether a user-space to kernel-space copy is required, or whether
it's delayed. If it's delayed then the code for handling /dev/null
would presumably not do it, and the size of the write wouldn't
actually matter.
5.8 Flag i and pipe flag f
Flag i is useless in mailbox deliveries.
[FAQ] The following will work some of the time, when the message is
short enough, but that's a coincidence. With a longer message,
though, Unix starts paying attention to what is happening, because
it will have to buffer some of the data, and then when the buffered
data is never read, an error occurs. The error is passed back to
Procmail, and Procmail tries to be nice and give you back your
original message as it was before this malicious program truncated
it. Never mind that in this case you wanted to truncate the
data. Anyway, the fix is easy: Just add an :i flag to the recipe
( :0fbwi instead of :0fbw) to make Procmail ignore the error.
:0 fbw
* condition
| malicious-pipe |
[dan] here's why the i flag is needed (courtesy of Stephan): You
told procmail to filter the entire mail (header and body), so it
does and it attempts to write out header and body to the filter.
Then procmail notices that not the entire body is being consumed.
Procmail, being rather paranoid when it comes to delivery of mail
assumes something went wrong and considers this a failure of the
filter.
5.9 Flag r
[philip] Procmail automatically turns on the r (raw mode) flag for
deliveries to /dev/null, so there's no need to do it yourself.
:0 r # you can leave out the `r'
* condition
/dev/null |
[david] You can use the r flag (for raw mode) on every recipe
where you do not want a From_ line added. I'm assuming that there
isn't one already there; the r flag keeps procmail from making
sure that there are a From_ line at the top and a blank line at the
bottom, but it will not make procmail remove them if they are
already present. Also, be careful to use the -f option on all
calls to formail so that formail won't add a From_ line.
Someone who didn't need From_ lines – I forget who – found it
annoying to put r onto every recipe and altered the source to
prevent procmail from adding From_ lines at all, ever. I think a
better idea would be a procmailrc Boolean to enable or disable them
for all recipes without affecting other users. (Then perhaps we'd
need a reverse r flag to undo raw mode for one recipe at a time?)
5.10 Flag c's background
...Interesting. My vision of c is to think of CONTINUE
with message processing afterwards even if conditions matched.
[david] Precisely: when you have braces, thinking "continue"
instead of "copy" or "clone" can get you into trouble.
Early versions of procmail, before braces and before cloning,
called the c flag "continue" in their documentation; I think it
is still called that in the source.
When Stephen introduced braces (but not cloning at this point), it
was of course implicit that an action line of "{" was
non-delivering, and a c was extraneous. People put c's there
because they wanted procmail to continue to the recipes inside the
braces on a match, and procmail brushed it off with an "extraneous
c-flag" warning. No harm done.
When Stephen introduced cloning, though, I was rather upset that he
was giving double duty to c instead of introducing something new
like C for it, especially because people who absolutely wanted no
clone but intended the recipes inside the braces to run in the same
invocation of procmail as everything else were mistakenly putting
c's on their braces to make sure procmail would "continue". People
would (and did) get double deliveries.
Roman Czyborra, though, said that if you consider c to stand for
"copy", that covers both uses of c: provide a copy to a simple
recipe or, if there are braces, to a clone procmail that will
handle the recipes inside the braces. Stephen agreed and changed
the documentation accordingly.
Longtime users of procmail and people who read old docs may still
think of it as "continue", but since the introduction of clones,
that is not a good way to look at it. "Copy" is much safer.
5.11 Flag c before nested block forks a child
[alan] The combination of a nested block and the c flag causes
procmail to fork a child process for the nested block, while the
parent skips over it and continues on. The child process doesn't
necessarily stop unless a delivering recipe (without the c flag)
action succeeds.
[david] When Stephen van den Berg added the use of c on a
recipe that launces a braced nesting block to fork a clone
procmail, I objected that it should have a different flag, such as
C, because people were always putting c on recipes that open
braces because they thought it was necessary to make procmail
continue into the braced area. Until then, it had been a harmless
error for an extraneous flag. Roman Czyborra came up with the idea
of changing the meaning of c from "continue" to "copy": read c
as "send a copy to the action" and then it would cover both
simple recipes with c flags and cloning blocks.
5.12 Flag c and understanding possible forking penalty
... I run shell commands that need not to be serialized, so
instead of doing the standard way:
:0 hic # nbr.1 / standard way
| command |
I assume I can avoid the extra fork caused by (c)lone flag
altogether by using these. Any difference between these two?
:0 # nbr.2 / alternative
* ? command
{ } # ...No-op, Procmail syntax requires this
dummy = `command` # nbr.3 / alternative |
[philip] There is a misunderstanding here. Let me clarify:
Procmail only forks a full-blown clone on a recipe with the 'c'
flag whose action is a nested block.
If it's a simple mailbox deliver, pipe, or forward action then
procmail does not fork a 'clone' (for pipe and forward actions
procmail does have to fork, but only so it can execute the
action). nbr.1 and nbr.2 take the same number of forks to
execute. They also take the same effective number of writes (in
case you're concerned about that). The latter also requires that
procmail wait for the command to finish. nbr.3 is worse than the
above two, as procmail has to not only wait for the command to
complete but also save the output into the named variable.
5.13 Flags before nested block
Given the following recipe, let's examine the flag part
:0 $FLAGS
{
do-something
} |
[david] HB AaEe and D affect the conditions and thus are
meaningful when the action is to open a brace. HB and D would
be meaningless, of course, on any unconditional recipe, but they
should not cause error messages. Generally, flags that affect
actions are invalid there, and bhfi and r always are, but the
others are partial exceptions: if you are using c to launch a
clone, then w W and a local lock file can be meaningful. If
there is no c, then w W and a local lock file are invalid at
the opening of a braced block.
5.14 Flags aAeE tutorial
[david] AaEe are mutually exclusive and no more than one should
ever appear on a single recipe. [philip] Actually, this is not
true. e does not work with E or a (and procmail gives a warning
if you try), and A is redundant if a is given, but at least some
of the other combination make sense and work.
- A = try this recipe if the conditions succeeded on the most
recent recipe at that nesting level that did not itself have an
A nor an a
- a = same as A, but moreover the action must have succeeded
on the most recently tried recipe at that nesting level
- e = Almost like A, try this recipe if the conditions matched
but the action failed on the most recently tried (not skipped)
recipe at this nesting level. universe, e is the opposite of a.
e only looks backwards past E recipes that were skipped
because of their E. It doesn't care whether a previous recipe
had an A or a flag.
- E = try this recipe if the conditions have failed on the most
recent recipe at that nesting level that did not have an E and
on since then every recipe at that level that did have an E;
essentially opposite of A
These mnemonics might help:
- A: if you did the recipe at the start of the chain, try this one
(A)lso
- a: if the last action at that nesting level was (a)ccomplished)
- e: if the last action at that nesting level (e)rred
- E: (E)lse because the conditions down the chain so far have not
matched. Or "try this recipe unless the last tried recipe matched".
# [philip] demonstrates `e'
:0 : # match, but action fails
/etc/hosts/foo
:0 A # no match
* -1^0
/dev/null
:0 e # this is skipped because the last tried recipe didn't match
{
...whatever
} |
How they interact with one another when used consecutively has not
been fully tested to my knowledge. Consider this:
:0
* conditions
non-delivering-action1
:0 a
action2
:0 e
action3 |
Is action3 done if action2 failed or if action1 failed (or perhaps
in both situations)? [philip] Action 3 is only done if action2 failed.
If the answer is action2, does this work to get action3 done if
action1 failed? I think it does, but does it also run action3 if
the conditions didn't match on the first recipe? [philip] Yes, and
yes.
:0 # [david]
* conditions
non-delivering action1
:0a
action2
:0E
action3 |
[philip] If that's not what you want, combine some flags:
:0
* conditions
non-delivering action1
:0 Ae
action3
:0 a
action2 |
If the conditions match, action1 will be executed. action3 will
then execute if action1 failed, otherwise action2 will be executed
[if action1 succeeded].
[david] I know what this structure does because I use it:
:0
* conditions
non-delivering action1
:0A
action2
:0E
non-delivering action3
:0A
action 4 |
If the conditions match, action1 and action2 are performed and
action4 is not (of course action3 is not either), even if action2
is non-delivering; if they fail, action3 and action4 are performed.
The A on the fourth recipe refers back to the third and no farther.
But I don't know about this:
:0
* conditions
non-delivering action1
:0A
* more conditions
action2
:0E
non-delivering action3
:0A
action 4 |
Now, suppose the conditions on the first recipe match but those on
the second recipe do not match. Would the third recipe (and thus
the fourth one) be attempted? I would expect so. [philip] Yes. The
last tried recipe didn't match, therefore the E flag will be
triggered.
If that isn't what you want, you can prevent it this way:
:0
* conditions
{
:0
non-delivering-action1
:0
* more-conditions
action2
}
:0 E # ignores mismatch inside braces, looks only at same level
non-delivering action3
:0 A
action4 |
If that is what you want, you can be positive this way:
# if action2 is non-delivering or vulnerable to error that
# would cause fall-through
DID2 # Kill variable
:0
* conditions
non-delivering-action1
:0 A
action3
:0
* ! DID2 ?? (.)
non-delivering-action3
:0 A
action4
# if action2 is delivering and sure to succeed
:0
* conditions
non-delivering-action1
:0 A
* more-conditions
action2
:0
non-delivering-action3
:0 A
action4 |
[philip] or those who are interested, I'll note that there are only
3 combinations of the a, A, e, and E flags that aren't
either illegal or redundant. They are Ae, aE, and AE. I've
shown a use for Ae up above. Here's an example of AE:
:0
* condition1
non-delivering action1
:0 A
* condition2
non-delivering action2
:0 AE
action3 |
action3 will only be executed if condition1 matched but condition2
didn't match. Without the A flag, action3 would be executed if
either of them failed. This can also be done with a instead of A
with analogous results.
Procmail's "flow-control" flags may not be particularly easy to
describe in straight terms (and this can all be made more
complicated by throwing in a more varied mix of delivering vs
non-delivering recipes), but I've found that it usually does what I
expect it to do, and when it doesn't or I'm in doubt or I want to
be particularly clear, I can always fall-back to doing it explicitly
via nesting blocks. Pick your poison...
6.0 Matching and regexps (regular expressions)
6.1 Philosophy of abstraction in regexps
Here are two ways to view or write regexps. Make up your own
mind. More on regular expressions at <http://regexlib.com/>.
People who are in favor of writing pure native regexps in the
recipes:
[ ]<[ ]*("([^"\]|\\.)*"|[-!#-'*+/-9=?A-Z^-~]+)... # " |
They think:
- I'm not planning on "maintaining" that code, as the syntax for
XXX will not ever change, it's RFC or something.
- I somehow doubt that anyone else will change that regexp more than
trivially
- If none of your other regexps use the categorical variables, and
you're not changing the regexp, then what's the point?
The variablized version will be slower, and will clutter the
environment with subprocesses.
Where someone that immediately wants to abstract things says (this
is from philip's great Message-Id matching recipe)
dq = '"' # (literal) double-quote
bw = "\\" # (literal) backwhack
atom = "[-!#-'*+/-9=?A-Z^-~]+"
word = "($atom|$dq([^$dq\]|$bw.)*$dq)'
local_part = "$word($s\.$s$word)*"
$s<$s$local_part... # ignore comment here |
...abstraction: It makes code clearer when you break it to
manageable parts, which possibly surfaces reusable parts. It also
makes thing look simpler, and enables even novices to understand
what's going on there. After we're not connected to the net
anymore, others could possibly understand it too. So, naturally we
can't agree with any of the previously mentioned arguments
presented for keeping regexp "in pure native format".
- Although you won't maintain it, it's an example for others. What
you post first, people will save it to their mailboxes and
circulate elsewhere in the net: "Hey, I've saved this, try it"
- You can write cryptic regexps or break them into parts where
the whole looks much simpler. Consider novice's welfare :-)
This has nothing to do with the "It never changes in my lifetime".
- The speed penalty imposed by additional variables is not
something we can measure in practice. CPU won't even hiccup.
An extra formail call in your recipes is 10x as expensive as
100 variables. (I don't know how to measure that, but launching
a shell and creating a process is a much more expensive task).
- Cluttering the env process? C'm on. That won't matter either.
No outside process use lowercase environment variable names, or
then it must be real special program. So called "cluttering" of
environment space is also no-issue. CPU won't even get a hiccup
for that.
6.2 Matches are not case-sensitive
Okay, okay; if you read the manual you knew that already. But
sometimes someone with years of experience with Unix may take it for
granted that procmail would be case-sensitive as the rest of the
Unix tools are. Use the D flag to turn on case-sensitivity.
6.3 Procmail uses multi line matches
Procmail uses multi line matches by default. This means that ^ and $
match a newline, even in the middle of a regexp. Now you know this,
you can easily interpret e.g. $[^>] as: `a newline followed by a
line not starting with a >.
If you put a $ after the \/ match token then procmail will
include the matched newline if there's one there. Solution? Don't
put a dollar sign there unless you really want a newline, use period
that matches all but newline:
:0
* B ?? ^Search-string: \/.+ |
6.4 Headers are unfolded before matching
If you have a header that continues on separate lines, you don't have
to worry about the line feeds. Procmail silently unfolds the header onto
one line, before matching it
Received: from unknown (HELO Desktop01) (208.11.179.72) by
palm.bythehand.net with SMTP; 4 Dec 1997 23:29:09 -0000
:0 # note, match on continuation line
* ^Received:.*bythehand\.
{
# Do something
} |
6.5 Improving Space-Tab syndrome
Procmail doesn't know about standard escape codes like \t and \n
or [\0x00-\0x133]:
# Not what you think # You have to write: space + tab
[ \t] [ ] |
But using the space+tab is not very readable and it's a very error
prone construct. Here is a suggestion to use variables to improve the
readability:
WSPC = " " # whitespace = space + tab
SPC = "[$WSPC]" # regexp whitespace, the short name
# SPC was chosen because you use this
# a lot in condition lines.
NSPC = "[^$WSPC]" # negation of whitespace
:0
*$ var ?? $NSPC
{
# match anything except space and tab
}
:0
*$ ! var ?? ($SPC|$)
{
# match anything ecxept space and tab and newline
} |
But you cannot use newline inside brackets.
WSPCL = " "' # Whitespace with line feed
'
# Won't work although WSPCL definition is correct.
*$ var ?? [$WSPCL] |
Instead use variable syntax:
SPCNL = "($SPC|$)" # space + tab + newline |
If you absolutely need a range of characters, see if you have echo
command in your system to define variables like this:
NUL_CHAR = `echo \\00`
DEL_CHAR = `echo \\0177`
REGEXP_NON_7BIT = "[^$NUL_CHAR-$DEL_CHAR]" |
6.6 Handling exclamation character
[philip] you do need the first backslash, to keep procmail from
considering the backslash as a request to invert the sense of the
match. For example, these two conditions are equivalent:
* ! 200^1 foo
* 200^1 ! foo |
Therefore, a leading '!' must either be backslashed, enclosed in
either parens or brackets (I suspect that parens would be more
efficient), or prefaced with an empty pair of parens. I would
recommend writing the condition with one of these:
* 200^1 \!!!!
* 200^1 ()!!!!
* 200^1 (!!!!) |
6.7 Rules for generating a character class
In a "character class" (things between "[" and "]"), metacharacters
don't need to be escaped. Well, a backslash is an exception.
e.g. [$^\\ would match any one of the literal characters dollar,
opening bracket, caret, and backslash.
- To match "])" use [])]
- To match "[(" use [)
- To include a literal ^ must not be first
- To include a literal - must be first, last or \-
- To include a literal \ you must use \\
- To include a literal ] must be first
- To include a literal [ ( ) or $ just use it anywhere
[elijah] If you are inverting a character class "first" means just
after the(^). So the character class that contains everything but ]
^ and - must look like this:
[david] What if I want literal $ inside bracket? A $ inside
brackets, unless it begins a variable name and the "$" modifier is
on, always means a literal dollar sign. It cannot mean a newline if
it appears inside brackets. A good way to keep it exempt from "$"
interpretation is to put it last inside the brackets (unless one
also need to include a literal hyphen and one can't put the hyphen
first; then you'll need to escape the dollar sign with a backslash
and put the hyphen last – well, you could alternatively escape the
hyphen, I guess), because procmail knows that "$]" cannot possibly
be a reference to a variable.
General guideline:
- ($) always matches a newline, with or without "$" interpretation;
- [$] always matches a dollar sign, with or w/o "$" interpretation;
6.8 Matching space at the end of condition
[david] If you need to have tab or space at the end of condition line
you can use these:
* rest of string .*
* rest of string[ ]
* (rest of string )
* rest of string ()
* rest of string( ) # This may be the best |
[philip] From my looking at the source, the last two should be
equal in efficiency, and except for a trace difference in regcomp
time, should match at the same speed as a solitary trailing blank.
The character class version [ ] will be slower. Of course, I
suspect that neither you nor your sysadmin will ever notice the
difference in speed, and given that 99% of all systems are I/O
bound and not CPU bound, the system is incredibly unlikely to
notice either. I can't complain though, as I also go to various
extremes to seek out every last bit of possible performance. Ah
well. The first one would be slower yet, though perhaps no slower
than the bracket form.
6.9 Beware leading backslash
I am trying to come up with a procmail recipe that among other
things should have the condition 'body does not contain a
particular word'. Here is what I tried:
[david] You have fallen into the leading backslash problem, If the
first character of a regexp is a backslash, procmail takes it as "end
of leading whitespace" and strips it. What you coded means "a less-than
sign, then the word, then any non-word character." (It also prevents
the less-than sign from being taken as a size operator.) Unless the
non-word character immediately to the left of the word was a less-than
sign, that regexp would fail (and thus the condition would pass). Try
this:
This would work too:
but in a casual reading it would look like "literal backslash,
less-than sign, the word, word boundary character," so we on the list
generally recommend the empty parentheses.
Do note that the difference in meaning of \< and \> in procmail (where
they must match a non-word character) from their meaning in perl and
egrep (where they match the zero-width transition into and out of a
word respectively) does not come into play here. Because procmail's \<
and \> can match newlines (both real and putative), it rarely is a
factor. It's a problem only when a single character has to serve both
as the ending boundary of one word an also the opening boundary of
another. Well, it's also a problem when you have one as the last
character to the right of \/, but that's easily solved.
6.10 Correct use of TO Macro
- TO is not a normal regular expression; it is a special
procmail expression that is designed to catch any destination
specification. For details, see the miscellaneous section of
the procmailrc(5) man pages.
- Prefer TO_ instead of TO if you have new procmail. TO_ is
better because TO used to be too loose
- Please remember to write ^TO, with the anchor in it.
- Do not put a space between the caret (^) and the word TO in
^TO.
- Do not put a space between the ^TO and the text that you are
matching on; it must be ^TOtext If this bothers you, you can
use TO()text instead to get better separation of text.
- Both letters in TO must be capitalized.
6.11 Procmail's regexp engine
[philip] procmail's regexp engine has no special optimization
for anchoring against the beginning of the line. Most program that
have such an optimization have it because they need the line
distinction for other reasons (for example, grep by default prints
the entire line containing a match). Procmail has no such other
reason, so it treats newline like any other plain character in the
regexp. There should be no speed difference as long as procmail
can say: "the first character I see must be a 'foo'". Note that
case insensitivity is handled by making everything lowercase, so a
letter being first doesn't bring in the spectre of character-classes
or anything like that.
> recipe may have just changed the size of the head, procmail
> cannot keep a byte-count pointer nor a line-count pointer to
> where the body begins but must scan through the head to find the
> blank line at the neck before it begins a body search.
Procmail does this when it reads in the head, not when it goes to
search the body, so that cost can't be avoided. Let me repeat; that
searching the body is no slower than searching the header, if we
forget the minimum impact of the size of these two.
6.12 Procmail and egrep differences
[By david]
- ^ and $ are non-zero-width and anchor to real or putative
newlines (rather than to the zero-width start and end of a line);
- An initial ^^ or a final ^^ anchors to the opening or closing
putative newline respectively;
- ^ and $ in the middle of a procmailrc regexp match to an embedded
newline (and must be escaped to match to a caret or a dollar sign);
- \< and \> are non-zero-width and match to a character that
wouldn't be in a word (or to a real or putative newline) [rather
than to the zero-width transition into or out of a word]; it
always matches one non-word character. It will fail when there is
no whitespace after the colon. This is rather pathological but
still perfectly compliant with RFC822. For this reason,
you should use (.*\<)? instead of just .*\< after the colon that
terminates a header field name:
^Subject:.*\<humor\> # Wrong
^Subject:(.*\<)?humor\> # Right, notice ? |
- *, ?, and + in the absence of \/ are stingy rather than greedy,
and that generally won't matter, but in the presence of \/ they
are stingy to the left of \/ and greedy to the right of \/,
while in most applications the leftmost wildcard on a line is
the greediest and greed decreases from left to right.
6.13 Understanding procmail's minimal matching (stingy vs. greedy)
...I want to have a procmail recipe that will save certain mail to
folders where the folder name (always a number) is specified in
the subject.
:0 :
* ^Subject: *\/[0-9]*
$HOME/Mail/$MATCH |
[philip]...and this won't quite work. For a subject with a space
after the tab, the '*' on the left hand side will be matched
minimally (zero times), and then the stuff on the right hand side
will be matched maximally, but starting at the space still, which
will match nothing. This is a case were procmail's minimal matching
can cause massive confusion and frustration. The solution is
usually the following:
FORCE THE RIGHT HAND SIDE TO MATCH AT LEAST ONE CHARACTER |
By Changing the recipe to:
:0 :
* ^Subject: *\/[0-9]+
$HOME/folders/$MATCH |
it'll work, because then the left hand side will have to match all
the way up to the first digit (but not the digit itself). If you
follow the rule in caps then you'll almost always be able to ignore
procmail's weirdness in this area.
[david] And examine how procmail matches "Subject: Keywords 9999"
* ^Subject:.*Keywords.*\/[0-9]*
procmail: Match on "^Subject:.*Keywords.*\/[0-9]*"
procmail: Matched "" |
The right side was as greedy as it could be; the problem is that we
seem to expect greed on the left as well. MATCH is set to null, in
contrary to our expectation. It is not a bug but rather a frequently
misunderstood effect of the way extraction is advertised to operate.
Remember that only the right side is greedy; the left side is
stingy, and left-side stinginess takes precedence over right-side
greed.
Extraction is implemented this way: the entire expression, left and
right, is pinned to the shortest possible match; then the division
mark is placed and the right side is repinned to the longest
possible match starting at the division. The tricky part is to
remember that the division is marked during the stingy stage.
If the expression is
^Subject:.*Keywords.*\/[0-9]* |
and the text is
<newline>Subject:<space>Keywords<space>9999<newline> |
then the shortest possible match to the entirety is
<newline>Subject:<space>Keywords |
because ".*" and "[0-9]*" both match to null. Then the division
mark is placed on the space after "Keywords" and procmail looks for
the longest possible match to [0-9]* starting with that space.
That, again, is null, so MATCH is set to null.
We see that it works as expected if regexp is changed to this:
^Subject:.*Keywords.*\/[0-9]+ |
That is a whole other ball of wax. Now the shortest match to the
entirety is
<newline>Subject:<space>Keywords<space>9 |
and the division mark is placed at the 9. Then procmail refigures
the longest match to the right side starting at the division mark
and sets MATCH=9999. However here
^Subject:.*Keywords\/.*[0-9]* |
the second ".*" would have reached not just up to the digits but
through them to the end of the line. MATCH would contain the rest of
all of it matched to ".*" plus null match "[0-9]*".
[for curious reader]
Given line
the second, which differs only by inserting the extraction marker,
would not match and would not set $MATCH:
^Subject: Keywords *9999 # matches ok
^Subject: Keywords *\/9999 # won't ! |
because the left side would be matched to "<newline>Subject:
Keywords" and the immediately following text, " 9999", did not match
the right side. It would actually make the condition fail and keep
the recipe from executing. It took a lot of circuitous coding to
allow for not knowing in advance exactly how many spaces there would
be before the digits.
Call it counterintuitive, but it's not a bug. General advice:
always make sure that the right side cannot match null or that the
last element of the left side cannot match null. Or in other words:
force the right-hand side of the \/ to match at least one character.
6.14 Explaining \/ and ()\/
MATCH strips all leading blank lines in 3.11pre7
[david] \/ with nothing to the left of it means "one foreslash". To
start a condition with the extraction operator, use ()\/ or \\/;
the latter looks counter intuitively like "literal backslash and
literal foreslash" (as it would mean if it appeared farther along
in the regexp), so most of us prefer the former.
*$ var ?? $s+\/$d+ # ok, \/ in the middle
*$ var ?? \/$d+ # Wrong, when \/ is at the beginning
*$ var ?? ()\/$d+ # No ok, () at the beginning |
6.15 Explaining ^^ and ^
[philip] Procmail doesn't think lines when it matches; but it
concatenates all lines together and then runs the regexp
engine. This may be a bit surprising, but consider the following where
we want to discard any message that is likely a HTML advertisement
# Body consists entirely of HTML code
# something which'll match any message which has "<HTML>"
# in the body
:0 :
*$ B ?? $s*<HTML>
HTML.mbox |
The condition test is applied to the entire body. If you want to
limit it to match only against the beginning of the body, you have
to say so using the ^^ token, as you discovered. A simple line
anchor (^ or $) just says that there must be a newline (or the
beginning or end of the area being searched) at that particular
point in the text being matched. notice the leading anchors below.
# trap spam where the *very* first line of the body started with
# <HTML>
:0 :
*$ B ?? ^^$s*<HTML>
HTML.mbox |
What, exactly, does "Anchor the expression at the very start of
the search area..." i.e. the ^^ ?
[dan] Technically, an opening ^^ anchors to the putative
newline that procmail sees before the first character of the search
area (and a closing ^^ anchors to the putative newline that
procmail sees after the end of the search area). When the search
area is B, that is a point equivalent to the second of the two
adjacent newlines that enclose the empty line that marks the end of
the head.
The reason I'm bringing that up is this: if there are multiple
empty or blank lines between the head and the body, ^^ will mark
the start of the second of those lines, not the start of the first
line of the body that contains some text.
So if you want to test whether <pattern> is the first printing text
in the body, even if it is not necessarily flush left on the very
first line, you might need a condition like the following, where
there is space/pipe/tab/pipe/dollar.
*$ B ?? ^^$SPCNL*<pattern> |
6.16 ANDing traditionally
Erm, you knew this already if you read the man pages. Stacking
condition lines one after another does the AND operation, where
all of the conditions must be present:
* condition1
* condition2 |
6.17 ORing traditionally
Here is simple OR case. There are some cases where it's impossible
to OR conditions with this style. [philip] knows more about those
cases.
Likewise, two exit code tests can often be ORed like this
But there are many situations where two tests cannot be ORed by
combining them into one condition:
- a regexp search of one area ORed with a regexp search of a
different area
- a positive regexp search [i.e., for a match to its pattern] ORed
with a negative regexp search [i.e., for the absence of any
match to its pattern]
- an exit code condition ORed with a regexp search condition
- an exit code condition seeking success ORed with an exit code
condition seeking failure
- a size test ORed with anything else (even another size test)
How can I make OR conditions that all use the SAME action? I want
to be able to test for a number of variants on certain requests,
all in one block.
[hal] Yes, this can be easily done
CASE = ""
:0
* case 1 tests
{
CASE = 1
}
:0 E
* case 2 tests
{
CASE = 2
}
:0
* ! CASE ?? ^^^^
{
# real work, perhaps with explicit tests on CASE
} |
Case study: Finding text from header and body
[david] In addition to the standard ways of coding OR, here's a
special one for searching the subject and the body for a given word
in either:
* HB ?? ^^(.+$)*(Subject:(.*[^a-z0-9])?|$(.*\<)*)remove\> |
If the string doesn't have to be preceded by a word border, it gets
a little simpler:
* HB ?? ^^(.+$)*(Subject:.*|$(.|$))*string |
6.18 ORing and score recipe
Once any of the conditions match, the score gets a positive value and
the recipe succeeds. Idea by Erik Selke <selke A T tcimet.net>
[era comments] ...allegedly the scoring system is going to cost you
more than plain old regex matching. Floating-point math and all that,
even if you use extremely simple scoring. Thus, it would probably be
slightly more efficient to do it the De Morgan way.
* 1^0 condition1
* 1^0 condition2 |
We can now write the previous case stydy (HB ORing traditionally)
with scores. I was tempted to write it like this, when [david]
told me the following.
* 1^0 H ?? match-it
* 1^0 B ?? match-it |
[david] That will work, but it isn't the best way to do ORing,
because if a match is found to the first condition procmail still
takes the trouble to test the second one. Better, use the supremum
score on each condition:
$SUPREME = 9876543210
*$ $SUPREME^0 first_condition_to_be_ORed
*$ $SUPREME^0 second_condition_to_be_ORed
* ... etc. ...
*$ $SUPREME^0 last_condition_to_be_ORed |
Upon reaching the supreme score, procmail will skip all remaining
weighted conditions on the recipe, deeming them matched. Since all
conditions on this recipe are weighted, once procmail finds one
matched condition it will skip the rest and execute the action.
6.19 ORing by using De Morgan rules
[Tim Pickett <tbp A T cs.monash.edu.au>] I thought I'd point out that
there are a few ways to do a logical OR of conditions. Someone posted
a solution here that involved using procmail's scoring system, but I
figured you could do it without scoring by taking advantage of De
Morgan's rule:
a or b is same as not(not a and not b) |
or mathematically:
Here's a way to do ORing
:0
* ! condition1
* ! condition2
{ } # official procmail no-op. MUST LEAVE SPACE
:0 E
action_on_condition1_or_condition2 |
7.0 Variables
7.1 Setting and unsetting variables
You have already set variables with the "=" syntax. Variable names
are case sensitive: var is different from VAR
VAR = /var/tmp # directory
VAR = "this" # literal
VAR = 1
VAR = $FOO # another.
VAR = "$VAR at" # combined with previous value |
Unsetting a variable is done like this
VAR # kill variable.
VAR= # same, but with old style
VAR = "" # Variable is said to be "null" now |
And you can put multiple assignments on the same line, although
not recommended:
Examine the following, which are all equivalent. The back ticks will not
require a shell in the absence of any SHELLMETAS so neither of
these will spawn a shell
# case1: We Don't care if file exists this time...
VAR = `cat file`
# case2: The use of {} is considered "modern"
:0
* condition
{
VAR = `cat file`
}
# case3: oldish, and procmail specific and errors have
# been reported if you use this construct.
# Note: There must be no space in "VAR=|"
:0
* condition
VAR=| cat file |
7.2 Variable initialization and sh syntax
Procmail borrows some sh syntax for variable initialization.
Note that sh's ${var:=default} and ${var=defaultvalue}
syntaxes are not available in a procmail rcfile.
- VAR1 = ${VAR2:-value}
sets VAR1 to VAR2 if VAR2 is set and non-null, and sets VAR1 to
default "value" otherwise
- VAR1 = ${VAR2-value}
sets VAR1 to VAR2 if VAR2 is set, and sets VAR1 to default
otherwise
- VAR1 = ${VAR2:+value}
sets VAR1 to "value" if VAR2 is set and non-null, and sets VAR1
to VAR2 otherwise.
- VAR1 =${VAR2+value}
Sets VAR1 to "value" if VAR2 is set and sets VAR1 to VAR2
otherwise.
And here are the classic usage examples
VAR = ${VAR:-"yes"} # set VAR to default value "yes"
VAR = ${VAR+"yes"} # If VAR contains value, set "yes" |
Ever wondered if this calls `date` in all cases?
No, procmail is smart enough to skip calling date if VAR already
had value. It doesn't evaluate the whole line. Below you see what
each initialising operator does. Study it carefully
VAR = "" # Define variable
VAR = ${VAR:-"value1"} # VAR = "value1"
VAR = ""
VAR = ${VAR-"value2"} # VAR = ""
VAR = ""
VAR = ${VAR:+"value3"} # VAR = ""
VAR = ""
VAR = ${VAR+"value4"} # VAR = "value4"
# Note these:
VAR = "val"
VAR = ${VAR:+"value3"} # VAR = "value3"
VAR = "val"
VAR = ${VAR+"value4"} # VAR = "value4"
VAR # kill the variable
VAR = ${VAR:-"value1"} # VAR = "value1"
VAR
VAR = ${VAR-"value2"} # VAR = "value2"
VAR
VAR = ${VAR:+"value3"} # nothing is assigned
VAR
VAR = ${VAR+"value4"} # nothing is assigned |
And if you want to choose from several initial values,
you might use the recipe below
instead of the standard var = ${var:-"value"}.
:0
* VAR ?? ^^^^
{
# no value (or was empty), set default value here based on
# some guesses
VAR = "base-default"
:0
* condition
{
VAR = "another-default"
}
...more conditions..
} |
You could also use equivalent, but less readable condition line in
previous recipe:
It works, because if variable contains a value the line expands to
Where "!" is the procmail "false" operation. One more way to do the
same would be, that we require at least one character to be present.
You could use also regexp (.), which would require at least one
character to be present, but you might not like matching pure spaces.
7.3 Testing variables
If possible, perform positive tests, rather than negative, like below:
With negative test, this would be:
Using literal strings like "yes" and "no" might present more clear
though what is going that a traditional "!" negation of a test.
Note, that the following fails if the variable is unset or null.
That was why it would be better to test:
Or
to require that variable contain at least one character. But
neither is a way to check whether a variable is set or not, because
each treats a null variable the same as an unset one. This is the
best way to check whether a variable is set or not:
[<gsutter A T pobox.com>] Here is yet another way to test if variable
is set and if it isn't, sets it to a default value.
:0
*$ ! VAR^0
{
VAR = "value"
} |
7.4 What does $\VAR mean?
[era and david] Procmail 3.11, $\VAR will escape regexp metacharacters.
It should produce a suitably backslash-escaped expression for
Procmail's own use. In addition $\VAR will always begin with leading
empty parentheses.
You can't pass the $\VAR construct to shell programs, because there
is that leading parenthesis. Here's a recipe to standardize the regexp.
You can pass SAFE_REGEXP to an external programs like sed.
PROCMAIL_REGEXP = "$\VAR"
:0
* PROCMAIL_REGEXP ?? ^^\(\)\/.*
{
SAFE_REGEXP = "$MATCH"
} |
[era] Note that this is slightly inexact; Procmail will
backslash-escape according to Procmail's needs, not sed's. For
example, Procmail doesn't think braces are magic (although that would
be nice to have in Procmail as well) whereas many modern variants of
sed do.
7.5 Common pitfalls when using variables
Procmail is picky and forgives nothing. Here are some of the favorite
mistakes one can make:
$EMAIL = "foo@example.com" # Done Perl lately? Remove that $
# Erm, this is ok, but many procmail recipe writers want to
# take extra precautions and include the regexps in parentheses.
# So, maybe (yabba|dabba|doo) would be more safe
REGEXP = "yabba|dabba|doo"
* Subject:.*$REGEXP # Hey, you need the "*$ Subject..."
*$ $REGEXP ?? hello # surely you meant '* REGEXP ?? hello' |
7.6 Quoting: Using single or double quotes
Pay attention to this:
VAR = "you"
NEW = 'hey "$VAR"' # won't extrapolate $VAR; you get literal
NEW = "hey '$VAR'" # extrapolates to: hey 'you' |
You can even combine separate words together
VAR = "1 ""and"" 2" # same as "1 and 2" |
Don't let these many quotes disturb you, just count the beginning
and ending quotes. Superfluous here, but you may need some similar
construct somewhere else.
VAR = '1 '"'"'and'"'"' 2' # same as: 1 'and' 2 |
[david] Beware forgetting quotes, like when you'd do
SENDMAILFLAGS = -oQ/var/mqueue.incoming -odq |
Procmail translates ! into | "$SENDMAIL" "$SENDMAILFLAGS" as the
procmailrc(5) man page warns us. By the rules of sh quoting, that
means that shell sees only the first switch
% sendmail -oQ/var/mqueue.incoming |
My suggestion: since you need a soft space inside $SENDMAILFLAGS,
use the quotes when you define $SENDMAILFLAGS but do this instead
of using the ! operator for forwarding:
SENDMAILFLAGS = "-oQ/var/mqueue.incoming -odq" |
[Walter Haidinger <walter.haidinger A T gmx.net>] Here's yet another
approach: deliver messages from procmail directly to mailboxes in
all those users' homes. No sendmail involved, much lower loads.
:0:
* <condition>
/var/spool/mail/someuser |
[philip] Assuming that "someuser" is an actual user in the
password file (I haven't been following this thread, some maybe
that isn't true here), then the following is probably better:
Walter Haidinger comments on this recipe: I'm happy to announce that
this works really well. No harm is done to the system-load
anymore. What a relief!
:0 w
* conditions
|procmail -d someuser |
That lets procmail's very tricky "screenmailbox()" routine take
care of bogus mailboxes in a secure fashion.
Is that as safe as forwarding? Does another sendmail delivering
to /var/spool/mail/someuser use the same locking mechanism and notice
that mailbox is already locked? I don't want to risk a corrupt
mailbox.
[philip] Sendmail only delivers directly to files through
aliases that say things like:
whatever: /some/local/file |
Under normal circumstances, sendmail calls the local mailer to actually
store mail in a file, and since that's procmail (right?), there
shouldn't be a problem. Also, sendmail 8 does kernel-level locking
when it delivers directly.
7.7 Quoting: Passing values to an external program
Remember to include the double quotes when you send variables'
values to the shell programs. Below you see a mistake,
because the content of the SUBJECT is not quoted and
thus not available from perl variable $ARGV[1].
:0 # Use procmail match feature
* ^Subject:\/.*
{
SUBJECT = "$MATCH"
}
:0
* condition
| perl-script $SUBJECT # mistake; use "$SUBJECT" |
There is also another way. If your script can access environment
variables (almost all programs can), then you do not need to pass
the variables on the command line. Above, the SUBJECT is already
in the environment and in Perl you can get it with:
$SUBJECT = $ENV{SUBJECT}; |
Next, do you know what is the difference between these two recipes?
:0
| "command arg1 arg2 arg3"
:0
| command "arg1" "arg2" "arg3" |
You guessed it. The first one quotes the entire command and does not do
the right thing, the latter is correct and depending on the content of
argN variables. Anyway, play safe and always add quotes.
Sometimes you need trickier quoting to to get single quotes around
the arg. Pay attention to this, because this may be the reason
why your grep command doesn't seem to succeed as you expect.
# If $GREP "$arg" doesn't seem to work
:0
* ? $GREP "'"$arg"'" $DATABASE
{
# Do something
} |
7.8 Passing values from an external program
External programs cannot set procmail variables directly. Programs
must write the values to external files and then read the values
from these files. Capturing only one value is easy:
var = `command` # capture STDOUT |
But if a program modifies the body and exports some status
information it is trickier. We assume here that the script is
controlled by you and that you have added the switch
--export-status option which causes the program to print
information to a separate file.
LOCKFILE = $HOME/.run$LOCKEXT # protect external file writing
valueFile = $HOME/tmp/values
# modify body, and export status values to external file: one
# value in every line
#
# VALUE1
# VALUE2
# VALUE3
:0 fb
| $NICE script.pl --export-status $valueFile
values = `cat $valueFile`
# Derive values from each line
:0 # line 1
*$ values ?? ^^\/[^$NL]+
{
var1 = $MATCH
}
:0 # line 2
*$ values ?? ^^.*$\/[^$NL]+
{
var2 = $MATCH
}
:0 # line 3
*$ values ?? ^^.*$.*$\/[^$NL]+
{
var3 = $MATCH
}
LOCKFILE # Release lock |
[richard] Alternatively write valueFile from your rc or external
program with lines like
PARAM1="value for param 1"
PARAM2="value for param 2"
PARAM3="value for param 3" |
and read it with
Now there is no need to worry about synchronizing the read with the
lines, or about adding new parameters, since each is labeled in
valueFile.
7.9 Incrementing a variable by a value N
[dan, phil and Richard] Here's a recipe for incrementing a variable
by a value N. If $VAR is not a number, we get an error. Note that
if $VAR + $N is not greater than 0, this recipe will not change the
value of VAR if the assignment happens inside braces. You must
place the assignment after the closing curly brace.
:0
*$ $VAR ^0
*$ $N ^0
{ } # procmail no-op
VAR = $= |
7.10 Comparing values
It's too expensive to call the shell's test function to do
[-lt|-eq|-gt] because you can do the same with procmail. The
do-something below is run if SCORE <= MAXIMUM. The recipe simply
subtracts SCORE from MAXIMUM and determines if the result is
positive.
:0
*$ -$SCORE ^0
*$ $MAXIMUM ^0
{
.. do-something
} |
[idea by era] it's getting slightly cumbersome if it's between MIN
and MAX:
:0
*$ $SCORE ^0
*$ -$MIN ^0
{
dummy # no-op, just for the LOG
:0
*$ -$SCORE ^0
*$ $MAX ^0
{
suitable
}
} |
Eg. When values are MIN=1, MAX=5, SCORE=4
procmail: Assigning "SCORE=4"
procmail: Score: 4 4 ""
procmail: Score: -1 3 ""
procmail: Assigning "dummy"
procmail: Score: -4 -4 ""
procmail: Score: 5 1 ""
procmail: Assigning "suitable" |
7.11 Strings: How many characters are there in a given string?
:0
* 1^1 VAR ?? .
{ }
LENGTH = $ |
7.12 Strings: How to strip trailing newline.
Suppose you have used regexp, which left newline($) in the MATCH.
If you wonder why the recipe works, remind yourself that regexp
operator "." never matches a newline.
:0
* VAR ?? ^^\/.+
{
VAR = $MATCH
} |
7.13 Strings: deriving the last N characters of a string.
# 1998-06-23 PM-L [walter] Note the use of
# the $ sign below to anchor to end-of-string...
#
# For last 2 characters use * VAR ?? ()\/..$
# For last 5 characters use * VAR ?? ()\/.....$
:0 # Last character
* VAR ?? ()\/.$
{
TAIL = $MATCH
} |
7.14 Strings: Getting partial matches from a string.
[dan] Getting a match to the right is quite easy with procmail's
match operator.
VAR = "1234567890"
:0
* VAR ?? ()\/3.*
{
result = $MATCH # now 34567890
} |
but deleting 2 characters from the end is nearly impossible without
forking an outside process. The cheapest might be expr because it
doesn't need a shell to pipe echo to it (as sed would and I
believe perl would):
# by resetting the shellmetas, this will only call
# `expr'. If we wouldn't have fiddled with shellmetas,
# this would have called two processes: sh + expr
saved = $SHELLMETAS
SHELLMETAS
result = `expr "$VAR" : '\(.*\)..'` # now 12345678
SHELLMETAS = $saved |
ksh or bash could do it as well:
# semicolon to force invoking a shell, actually
# first question mark will force a shell already.
saved = $SHELL
SHELL = /bins/sh
result = `echo ${VAR%??} ;`
SHELL = $saved |
Now, if you know that the last two characters will be "90", that's
different. Of course, this totally screws up if the third-to-last
character is a 9.
:0
* VAR ?? ()\/.*[^0]
* MATCH ?? ()\/.*[^9]
{
result = $MATCH # now 12345678
} |
[jari] Comments: If a shell must be used, then awk is a good tool for
simple string manipulation. Its startup time is faster that perl's
whose overhead is due to internal compilation. awk also consumes
less recourses overall than perl. Following will only work if VAR
is a string of continuous block of characters. (ARGV1 can be used)
saved = $SHELLMETAS
SHELLMETAS
VAR = ` awk 'BEGIN{ v = ARGV[1]; \
print substr(v,1,length(v)-2); exit }' \
"$VAR" \
`
SHELLMETAS = $saved |
This version requires some file, any file, so that we get awk
started. In the previous code all the work was done in the BEGIN
block and no file was ever opened.
saved = $SHELLMETAS
SHELLMETAS
VAR = ` awk '{print substr(v,1,length(v)-2); exit }' \
v="$VAR" /etc/passwd \
`
SHELLMETAS = $saved |
[dan] comments awk: expr is sure to be a smaller binary than awk
for procmail to fork, and it needs much less command-line code to
do this job. Note also that one still has to diddle with SHELLMETAS
to avoid a shell, because the awk code contains brackets; thus it
doesn't replace all.
There is also a way to remove words from the end of string by
procmail means if the strings are separated by same separator. Let's
use the word this-mailing-list-request which we would like to shorten
to this-mailing-list. [david] presented the recipe 1998-06-16 in PM-L.
VAR = "this-mailing-list"
# 1) if there is match at the end ending to these words
# 2) Get everything up till last match and store it to MATCH
# 3) Read MATCH, but exclude last dash "-"
:0
* VAR ?? -(owner|request|help)^^
* VAR ?? ^^\/.*-
* MATCH ?? ^^\/.*[^-]
{
VAR = $MATCH
} |
7.15 Strings: Procmail string manipulation example
[1998-06-23 PM-L walter] ... Now we get to apply these formulas
to strip the last character off a string. It gets a bit ugly for
special cases. I've deliberately chosen a worst-case scenario.
VAR = "Testing 012301230111"
RC_APPEND = $PMSRC/pm-myappend.rc
:0
* VAR ?? ()\/.$
{
TAIL = $MATCH # last character of VAR "1"
# Get the longest match that does not end in the TAIL character
:0
*$ VAR ?? ()\/.*[^$TAIL]
{
HEAD = $MATCH # now "Testing 012301230"
# if the last two or more characters in VAR are
# identical, they all get chopped, oops
:0
* -1^0
* 1^1 VAR ?? (.)
* -1^1 HEAD ?? (.)
{
dummy = "tooshort"
INCLUDERC = $RC_APPEND
}
}
}
result = $HEAD # "Testing 01230123011"
# ........................................ pm-myappend.rc
# LENGTH(HEAD) plus 1 SHOULD equal LENGTH(VAR). That is
# not the case when the last 2 (or more) ending
# characters are identical. in that case, call appendrc
# recursively to stick back an appropriate number of
# TAIL characters.
:0
* -1^0
* 1^1 VAR ?? (.)
* -1^1 HEAD ?? (.)
{
HEAD = "$HEAD$TAIL"
INCLUDERC = $RC_APPEND
} |
7.16 How to raise a flag if the message was filed
FILED = ! # ! is procmail "false"
:0 c: # We process the message more
* condition
foo
:0 a
{
FILED # Kill variable
}
...
:0 # Stop if previous cases filed the message
*$ $FILED
{
HOST = "_done_"
} |
Or alternatively: procmail automatically sets LASTFOLDER if
it delivers message to mailbox.
LASTFOLDER # kill variable
:0 c:
* condition
foo
:0 c:
* condition
bar
... et cetera ...
:0
* ! LASTFOLDER ?? ^^^^ # Or ${LASTFOLDER+!}!
{
HOST = "_done_" # Force procmail to stop
} |
7.17 Dollar sign in condition lines.
#todo, check this recipe
This doesn't seem to work for me...
* ^TO()$\foo@example.com |
[david] An unescaped dollar sign later in the line represents a
newline, so what you have there is searching for the following:
- An expression that matches the expansion of the ^TO token (which
is anchored to the start of a line by its definition), followed
by
- A newline, followed at the start of the next line by
- "foo@bar" [the backslash escapes the f, which didn't need
escaping], followed by
- any character that is not a newline (the period is unescaped),
and finally
- "com".
Try this instead:
*$ ^TO()$\foo@example\.com |
#todo: the dollar seems exactly the same in the above two
#todo Examples: are you sure that this is correct?
In fact, to avoid matches to things like foo@example.community.edu,
you might want to do it this way:
*$ ^TO()$\foo@example\.com\> |
7.18 Finding mysterious foo variable
I have my fellow worker's procmail code and he uses a variable FOO
that I can't find in his code anywhere. It's not a shell variable
either, because it's literal. Where does it come from?
Your procmail runs /etc/procmailrc when it starts, please check
that. It may define some common variables already for all users.
7.19 Storing code to variable
One way to run complex code in a procmail recipe is first to store
it in a variable. Idea by [era]. You could do this in a separate shell
script too. The following example reads URLs from the body of
a message: the URLs have been put to separate lines and some special
Subject is used to trigger the dumping of the HTML pages:
# Code by [era]
#
COMMAND='while read url; do
case "$url" in
*://*)
lynx -traversal -realm -crawl -number_links "$url" |
$SENDMAIL $LOGNAME
;;
esac
done'
# Notice the trailing semicolon after `eval' !
:0 bw
* ^Subject: xxxxx
| eval "$COMMAND" ; |
If you want to run the code inside the nested block, then look
carefully, there are double quotes around the command in back ticks.
If you leave double quotes out, then each word in SH_CMD would be
interpreted separately:
$SH_CMD = '$echo "$VAR" >> $HOME/test.tmp'
:0
* condition
{
# condition satisfied; run the given shell command
# and do something more.
dummy = `"$SH_CMD"`
..rest of the code..
} |
A similar construct works for message echoing too:
MESSAGE='Thank you so much for your message.
Unfortunately, the volume of mail I receive .... (blah blah blah).
If your matter is urgent, try calling +358-50-524-0965.
'
:0 hw
* ! ^X-Loop: moo$
| ($FORMAIL -rt -A "$MYXLOOP"; echo "$MESSAGE") | $SENDMAIL |
7.20 Getting headers into a variable
[david] Here are several ways to get the entire header into a variable:
HEADER = `$FORMAIL -X ""` # The space after the X is vital.
HEADER = `sed /^$/q` # also writable as HEADER=`sed /./!q`
:0 h
HEADER=|cat - |
will save the entire header into one variable. It has to be smaller
than LINEEBUF, though. This way might work as well, and will require no
outside processes if it does:
:0
* ^^\/(.+$)*$
{
HEADER = $MATCH
} |
7.21 Converting value to lowercase
If you know that a word belongs to set of choices, you can do
this inside procmail
LIST = ":word1:word2:word3:word4" # Colon to separate words
WORD = "WORD1"
:0
*$ LIST ?? :\/$WORD
{
WORD = $MATCH
} |
But if you don't know the word or string beforehand, then this is
the generalized way: [idea by era and david]
:0 D
* WORD ?? [A-Z]
{
WORD = `echo "$MATCH" | tr A-Z a-z`
} |
8.0 Suggestions and miscellaneous
8.1 Speeding up procmail
- Use absolute paths to take the burden of searching binary along path
from shell: Use $FORMAIL variable abstraction.
$FORMAIL = "/usr/local/bin/formail"
:0 fhw
| $FORMAIL -I "X-My-Header: value" |
- Multiple echo commands that spread many lines can be converted
to single echo command if \n escape is supported. You usually
see these in auto responders
echo "........."; \
echo "........."; \
echo ".........";
-->
echo ".........\n" \
".........\n" \
".........\n"; |
- You can avoid multiple and possible expensive FROM_DAEMON tests
by caching the result at the top of your .procmailrc. You can
now use variable $from_daemon like the big brother FROM_DAEMON.
The same idea can be applied to FROM_MAILER regexp. If you have
pm-javar.rc, it already defines variables $from_daemon and
from_mailer exactly like here:
from_daemon = "!"
:0
* ^FROM_DAEMON
{
from_daemon = "!!" # double !! means "OK"
}
:0
*$ ! $from_daemon
{
..do-it..
} |
- Count the back ticks and you know how many shell calls procmail
has to launch. See if you can minimize them and use some procmail
code instead.
- ^TO and other macros are expensive, see if you can use simple
Header:.*\<match-it\> instead. Well, it's not clear if this
gives you much speed advantage.
- Don't call "$FORMAIL -xHeader:" every time you need a header
value, consider if it suffices to use match operator \/.
- You can minimize the calls to only one formail if you add many
headers along the way: See formail usage tips in this document
- Searching body is expensive, simply because it contains more text.
There isn't much to do about this, because you use B anyway
when you need it.
- See if you can move some tasks to your .cron file. procmailrc is
not meant for those purposes. Instead of calculation daily
values every time in procmail, let cron do that at 04:00 or
21:00. Don't run cron at midnight if you can, because everybody
else is running their crons at the same time. If "logical" date
change time can be used (when you arrive to work, when you
leave the work), use it in cron jobs.
- [philip] Setting LINEBUF permanently to a big value slows
procmail down.
- Remove all calls to perl and use programs that are nicer to
the system (If you just call command line perl, there is
probably an equivalent alternative with awk tr sed cut)
- Examine each shell command and see if you do need SHELLMETAS.
If you can set SHELLMETAS to empty, this saves calling "sh" for
each invocation of the external command.
8.2 See the procmail installation's examples
Did you remember to look at the examples that come with procmail? If
not, it's time to give them a chance to educate you. Here is one
possible directory you could take a look. Ask from your sysadm if you
can't find the directory where to look into.
% ls /usr/local/lib/procmail-3.11pre7/examples/ |
Or if you're really anxious to get on your own, try this. The directory
/opt/local is for HP-UX 10 machines and the forward contains example
how to define your .forward for procmail.
% find /opt/local/ -name "forward" -print |
If the find succeeded and found the file, then you know where the
procmail files installation directory is.
8.3 Printing statistics of your incoming mail
If you keep the procmail log crunching, it will record to which
folder the messages was filed. There is program mailstat which
can process the procmail.log file and print nice summary out of it.
If you generate the summary at midnight and clear the log, you
get pretty nice per day/per folder traffic analysis.
# -m merges all error messages into a single line
% mailstat -km procmail.log |
8.4 Storing UBE mailboxes outside of quota
I want to store spam outside disk space. Problem: if I tell
procmail to deliver to, say, /tmp/spam.box, it does so just fine
(according to the log). Unfortunately, it delivers to /tmp on the
mail host which I cannot access. spam.box doesn't appear in the
/tmp directory of the shell machine when procmail is invoked for
incoming mail.
[philip] Under the most likely configuration of sendmail in
this situation, it is impossible to have procmail invoked by
sendmail on the shell machine: sendmail is probably set to just
forward all mail to the designated mail delivery machine.
There are other options: you could temporarily store the mail in
your account, then have a cron job on the shell machine that
reprocesses the message. That would probably be more efficient than
having each message trigger an rsh to the shell machine. If you
actually get enough spam that it's pushing against your quota, then
the rsh is too expensive – use a cron job that invokes something
like:
cd your-maildir &&
lockfile spam.lock &&
test -s spam &&
{
cat spam >> /tmp/spam.box && rm -f spam spam.lock || \
rm -f spam.lock;
} |
WARNING: the above assumes the following:
- everything in your-maildir/spam is spam and belongs in
/tmp/spam.box
- no further filtering of the messages is necessary: they just need
to be moved (it actually treats everything in the
your-maildir/spam as a single message and uses procmail as a
reliable copy command, thus the DEFAULT assignment as the use
of /dev/null as a empty procmailrc)
- /tmp/spam.box is a not a directory
If the latter two of those conditions isn't true OR IF THEY MIGHT
CHANGE then you should use formail -s to break the message apart
and invoke procmail on each one separately.
[era] Many sites cross-mount directories for various reasons. /tmp
is always local but /var/tmp might be cross-mounted between the
login host and the mail host; another one to try is /scratch – and
if all else fails, ask your admin to set up an NFS share for this
purpose.
8.5 Using first 5-30 lines from the message
[era] The regexp to grab few lines (or all of them, if there are
less than fifty) is not going to be very pretty, but it saves launching
an extra process.
:0
*$ B ?? ^^$SPCNL*\/$NSPC.*$(.*$)?(.*$)?
{
toplines = $MATCH
} |
The skipping of whitespace at the beginning of the message is of
course not necessary. You should probably set LINEBUF reasonably
high if you grab many lines, say 30: 80*30 = 2400 bytes; probably
setting it to 8192 or 16384 is a good idea, depending how much you
want to match. The above gets ugly quickly, so
# But if N=30, sed ${N}q if you don't have head
:0 i
{
toplines = `head -$N`
}
:0 a
* toplines ?? pattern
{
...do-it
} |
8.6 Using cat or echo in scripts?
I have seen a lot of examples that use 'echo', i.e.,
:0
* condition
| echo "first line of message" \
"second ..." \
"et cetera" |
I started out with spam.rc from "ariel" which got me into the
habit of
:0
* condition
| cat file_containing_message |
although I note that spam.rc did have one recipe using the echo
method. What are the reasons for choosing each method over the
other?
Here is a comparison table. Choose the one you think is best for you
- Echos don't have dependency on an external file:
everything is contained in the .procmailrc file. Echos keep
all the relevant stuff in one file. Cat's make you
maintain multiple files. That's the main
reason I lean toward echo's; you may have accounts on
several machines. It is easier to be able to copy just one
generic .procmailrc between them without having to copy a bunch
of messages also. Mostly, though, there's no real difference
between the two methods.
- Echo is easier to use with variables.
- Echo starts many processes, cat only starts one, but this is
not always true: In most current Bourne shell implementations,
echo is a built-in. This holds true with tcsh too.
- The main problem I see with the use of cat is "what happens when
you forget the file or destroy it ?". I suggest to, at least,
test that the file is readable before catting it.
- [richard] An argument against echo is that it is not well
standardized, and different versions may exist on the same
machine. Some recognize -n, some don't; some recognize embedded
metacharacters, some don't.This is an argument in favor of
print. Print, however, is not a built-in on all systems. The
comment on built-ins is pertinent to situations when a shell is
spawned. When procmail handles the call directly, it will
always look for a stand-alone executable. I guess echo may be
better, as long as we are aware of any differences in behavior
between built-in and stand-alone versions.
8.7 How to run an extra shell command as a side effect?
[jari] I was once wondering what would be the wisest way to send
messages to my daily "biff" log file about the events that
happened during my .procmailrc execution. This is how [david]
commented on my ideas
# case 1: print to BiffLog
dummy = `echo "message: $FROM $SUBJECT" >> $biff` |
[david] Problems you get no locking on the destination file, and
unless you put it inside braces you have to run it on every message
unconditionally. (Also procmail tries to feed the whole message to
a command that won't read it, but the remedies for that don't help
very much.)
# case 2: We consume delivering recipe and therefor have to use
# `c' flag.
:0 whic:
| echo "message: $FROM $SUBJECT" >> $biff |
Here it locks the destination file and you can add conditions to
it, so it's probably the best. If the head or the body is less than
one bufferful, you can limit the unnecessarily written data with h
or b, but I think that in most OSes a partial buffer and a full
one are the same amount of effort.
# case 3: We use side effect of "?" here. Cool, but this
# doesn't do $biff file locking thus message order may
# not be what you expect.
:0
* condition
* ? echo message: $FROM $SUBJECT >> $biff
{ } # procmail no-op |
We have conditions possible, but there is no locking on the
destination file. I'd go with method #2 or a variation thereof:
:0 hic: # we don't necessarily need `w'
* condition
| echo message: $FROM $SUBJECT >> $biff
:0 hi: # Or you could use this
* condition
dummy=| echo message: $FROM $SUBJECT >> $biff |
[jari] Now, when [david] has explained how various ways differ
from each other, I present the recipe where I used the case 3.
When I was dropping a message to a folder, I wanted to send a
message to my biff log too. The idea is that the drop-conditions
have already matched and then we run extra command by using side
effect of "?" token. As far as the recipe is concerned, the "?"
is a no-op. The pedantic way would have been to add the LOCKFILE
around to the recipe, but imagine 50 similar recipes like
this...and you understand why the LOCKFILE was left out. It's
only necessary if you worry about sequential writing to the biff
file.
:0 :
* drop-condition
* ? echo message: $FROM $SUBJECT >> $biff
$MBOX |
8.8 Forcing "ok" return status from shell script
...the "?" trick only allows running some additional shell
commands (true command always succeeds) while conditions
above have already determined that drop will take place. And you
can always make condition to succeed if a misbehaving shell script
always returns a failure exit code.
* ? misbehaving-shell-script || true |
[david] If the script always returns a failure code, just do this:
* ! ? misbehaving-shell-script
The more complex case is a script that can return either success or
failure but you don't care which; if the drop conditions passed,
you want to run the action line. echo can also fail if the
process lacks permission or opportunity to write to stdout. A more
reliable choice is true(1); its purpose in life is to do nothing
but exit with status 0.
The command : is a shell built-in which always returns true
status. Not exactly more readable than true(1) "|| :" will save the
invocation of true (unless true is built into $SHELL), but procmail
will still run a shell. On the other hand, as long as the command
itself has no characters from SHELLMETAS a weight of 1^1 and no
"|| anything" will avoid the shell process as well.
However, there is yet a better way to make sure that a failure by the
script doesn't make procmail abort the recipe:
:0 flags
* other conditions
* 1^1 ? shell-script
action |
Regardless of the exit status of the script, the condition will score
1 and not interfere with procmail's decision about the action line of
the recipe. Weighted exit code conditions behave like this (see the
procmailsc(5) man page):
scores w on success or x on failure.
scores the same as this:
* w^x pattern_that_appears_in_the_search_area_$?_times |
8.9 Using grep with file lists to mach messages
If you want to implement blacklisting or whitelisting, here is the
idea how to call grep program to do matches. First, suppose you
want to match against bad words. The following example supposes GNU
tools for sed, egrep. The regexp variable is read from file by
first calling tr to produce "this |that|\<get this!|" which is a
combined string from file's lines. It is further post processed
with sed by a) deleting any trailing whitespaces before (|) and
by b) deleting unnecessary trailing or(|) token(s) added by tr.
EGREP = "/bin/egrep"
SED = "/bin/sed"
TR = "/usr/bin/tr"
file = $HOME/procmail/spam-regexp.lst
regexp = `$TR '\n' '|' < $kwdfile | $SED -e "s/[ \t]+|/|/ ; s/|+$//" `
:0 HBw
* ! regexp ?? ^^^^
*$ ? $EGREP --quiet --ignore-case --regexp='$regexp'
{
# Matches, do something
} |
It is a little easier to check sender's address against a
whitelist, because it is possible to use "word" based checking in
contrast to regular expression checking aboce. Supposing that
file contains known email addresses listed one at a time, the
recipe recipe would be:
file = $HOME/procmail/spam.keywords
searchFields = "-xSender: -xFrom -xFrom: -xReturn-Path: -xReply-To:"
:0 w
*$ $FORMAIL $searchFields | $EGREP --quiet --ignore-case --file='$file'
{
# This sender is known
} |
A word of caution: white list or black list based sender
matching does not work 100%. The spammers hijack large amount of
other people's email addresses which they ruthlessly use in
identifying the message's sender. It is no surprise to receive a
Unsolicited Bulk Email from friend – he is not the real sender,
but his address was drifted to spammers email database.
9.0 x1 8.11 Using dates efficiently
Note: See module list, where you will find date and time
parsing modules. You can also parse the date from the first
Received or From_ header if it is the same each time in your
system. That would be orders of magnitude faster and decreases
your system load if you receive lot of mail.
Calling date in your procmail script many times is not a good
idea. Use the MATCH as much as possible to be efficient in
procmail, like below where we call date only once. If you are not
in the same time zone as your server, and you want an accurate
report of the date, you might amend the invocation to the following:
date = `TZ="KDT9:30KST10:00,64/5:00,303/20:00";date "+%Y %m %d"` |
The basic recipe is here
# By [richard] add %H:%M%S if you want these as well
:0
* date ?? ^^()\/....
{
YYYY = $MATCH
}
:0
* date ?? ^^..\/..
{
YY = $MATCH
}
:0
* date ?? ^^.....\/..
{
MM = $MATCH
}
:0
* date ?? ()\/..^^
{
DD = $MATCH
}
TODAY = "$YYYY-$MM-$DD" # ISO std date: like 1997-12-01 |
9.1 Keep simple header log
Here is a simple strategy: record all what comes in and record all what
happened to that message. See how brief info is constantly recorded to
BIFF folder. You can now check the BIFF log every day to
see if the messages were sunk to right folders: Remember to add BIFF
rule to every recipe, so that the sink message [sunk-somewhere] is
recorded after incoming message headers.
Emacs can display the file in one buffer window and keep it updated
with M-x auto-revert-mode. It gives a nice overview of arriving
mail messages and is equivalent to biff(1).
# this requires that HH and MM have been setup before,
# see pm-jadate.rc
NOW = "$HH:$MM" # the time only
TODAY = "$YY-$MM-$DD $NOW" # ISO 8601: date and time
$NULL = $SPOOL/junk.null.spool # /dev/null is dangerous
BIFF = $PMSRC/pm-biff.log
# If you prefer a log per day (easy for cleanup):
# BIFF = $PMSRC/pm-biff.log.$YYYY$MM$DD
# .............................................. headers ...
# DON'T USE THESE: they call shell
#
# FROM = `$FORMAIL -zxFrom:`
# SUBJECT = `$FORMAIL -zxSubject:`
:0 # Use procmail match feature
* ^From:\/.*
{
FROM = "$MATCH"
}
:0 # Use procmail match feature
* ^Subject:\/.*
{
SUBJECT = "$MATCH"
}
# ............................................. incoming ...
# record log of incoming mail
:0 hwic:
| echo "$TODAY $FROM $SUBJECT" >> $BIFF
# ......................................... null recipe ...
# Now, this is how you add the "message" what happened
# to that mail. See "?" shell call in the recipe
:0 :
* From:.*(remove|delete|free|friend@)
* ? echo " [null-AddrReject]" >> $BIFF
$NULL |
9.2 Gzipping messages
[Sean B. Straw <PSE-L A T mail.professional.org>] On the recipe
delivery line where you'd normally be tossing it into a folder do
this instead:
:0 c:
|gzip -9fc >> $MAILDIR/mail.mbox.gz |
This will compress each message as it comes in (and since most are
TEXT, it does a fine job - MIME, OTOH is one of the best ways to
mailbomb someone since it doesn't compress well - but the indirect
bombing via mailing lists doesn't do this), reducing the disk space
required, usually dramatically. Done in conjunction with something
like the following at the end of your .procmailrc, you could have a
header file you could quickly rummage through looking for valid
messages to add to a procmail recipe, then run:
gzip -d -c mail.mbox.gz | formail -s procmail -m recipe.rc |
(note that if the recipe delivers into the mail.mbox.gz file on any
condition, then you should look to MOVE the file before running
this process, and use the moved version. In fact, this would be a
good idea anyway, as newly delivered mail may appear in the end of
the gzip file while you're doing this - and since your ultimate
goal is to be able to eliminate junk, you'll want to know that
after you've processed a gzipped mail file, you can delete it
without accidentally whacking new mail).
:0
* LASTFOLDER ?? ^^^^
{
# Save the message in case we need to retrieve it.
:0 c:
|gzip -9fc >> $MAILDIR/mail.mbox.gz
# copy headers for easy browsing - including being able to
# identify lists you're being subscribed to.
:0 h:
header.log
} |
9.3 Emergency stop for your .procmailrc
[jari] If I have a bad luck while I am testing a new recipe, it may
run in a loop and and it may send me continuously mail messages. I
then have to quickly recall .procmailrc and start disabling my
individual "control" recipe files. Yet I figure, in situations like
this where every second is important, there must be a better way.
[alan] This is quite easy already; put this at the top of your
procmailrc:
# instead of leading dot file, you may prefer
# stopFile = $HOME/procmailrc.stop which shows up in default ls.
# In the other hand you can do ls ~/.procmail* to see both...
stopFile = $HOME/.procmailrc.stop
:0
*$ $IS_EXIST $stopFile
{
EXITCODE = $EX_TEMPFAIL # Means: retry later; requeue
HOST = "_stopped_by_external_request_"
} |
Then, when testing your procmailrc and disaster happens, you can
simply do following to disable your procmailrc filtering.
% touch $HOME/.procmailrc.stop |
[richard] This is also a candidate recipe for including in
an INCLUDERC. Combining the two ideas, we have a file
procmailrc.stop which contains the recipe and is included near the
top of .procmailrc, When you don't want it, mv it to procmailrc.go.
Procmail complains about missing INCLUDERCs, but it does not
complain about them if they exist and are empty. Another reason to
not use dotted file names, but to use cp instead of mv.
10.0 Scoring
10.1 Using scores by an example
First make all the needed matches and let the SCORE value to be
set. Examine the score after the final value has been calculated.
The condition lines say:
- Start with some threshold: -250.
- Read the subject into MATCH
- Add 50 for each match of !. Notice the "^1": if it read
"^0", only one 50 would be added for "!!!!", now that counts
as 4 x 50 = 200. See procmailsc(1) for "^N" syntax.
- Any dollar sign is likely spam.
- find uninteresting subject words
- And a negative count for replies.
- Usually spam doesn't seem to have Re: in subject field.
(but don't rely on this, spammers have started to use "re:")
- letters such as !!! frequently found in the body are usually
indication of spam. Add 100 for each match.
# Idea by 26 Sep 97 Stephane Bortzmeyer <bortzmeyer A T pasteur.fr>
:0
* -250 ^0
* ^Subject:\/.+$
* 50 ^1 MATCH ?? [!]
* 50 ^1 MATCH ?? [$]
* 100 ^1 MATCH ?? ()\<(free|sex|opportunity|money|great)\>
* -250 ^0 ^Subject: *(Fwd|Fw|re):
* B ?? 100 ^0 ()!!!
{ } # official procmail no-op
SCORE = $= # Score has been calculated
:0 fhw
| $FORMAIL -i "X-Spam-Score: scored $SCORE"
:0: # If score had positive value, sink message
*$ $SCORE^0
junk.spam.mbox |
Given the following subject:
"Great opportunity for free sex; no money required!!!!" |
procmail scores it this way: ! was found 4 times (200/weight 50),
"free|sex..." regexp matched 4 times (400/weight 100).
condition score Total sum so far
---- ----------------
procmail: Score: -250 -250 ""
procmail: Score: 200 -50 "[!]"
procmail: Score: 0 -50 "[$]"
procmail: Score: 400 350 "^Subject:.*\<free|sex|...
>"
procmail: Score: 0 350 "^Subject: *(Fwd|Fw|re):"
procmail: Score: 0 350 ! ""
procmail: Assigning "SCORE=350" |
[david] Some notes on possible regexps and their differences:
* 100^1 ^Subject:.*\<(free|sex|opportunity|money|great)\> |
That condition says to score 100 for every subject line that
contains any of those five words ... not to score 100 for every one
of those words in the subject, but 100 for every subject line that
contains any of those words. So it will never score more than 100
unless there are multiple subject lines. You see, it offers five
alternative regexps:
^Subject:.*\<free\>
^Subject:.*\<sex\>
^Subject:.*\<opportunity\>
^Subject:.*\<money\>
^Subject:.*\<great\> |
Offhand, I think regexp below would score 400: 100 for
"Subject.*free" and 100 for "sex" etc. Of course, the score might
be higher if other lines in the header included the strings "sex",
"opportunity", "money", or "great<word border>", but appearances of
"<word border>free" outside the subject wouldn't be counted.
* 100^1 ^Subject:.*\<free|sex|opportunity|money|great\>
[translates to]
^Subject:.*\<free
sex
opportunity
money
great\> |
And this one would score 400 too. How? MATCH would contain whole
subject and there would be non-overlapping matches to " great ", "
opportunity ", and " free ". If we got rid of either or both of the
word-border marks, it would score 500.
Subject: Great opportunity for free sex; no money required!!!!
* 100^1 MATCH ?? ()\<(free|sex|money|opportunity|great)\> |
10.2 Brief Score tutorial
#todo: test
[elijah] If you're serious about using scores, please spend
a minute reading this short example.
VERBOSE = "yes"
:0
* 1^1 foo
* -2^2 bar
{ }
a = $=
:0
* 1^1 foo
* -2^2 bar
{
:0 f
| echo Whee: fun ; cat -
}
b = $=
:0
* 1^1 foo
* -2^2 bar
{
whee = "fun"
}
c = $=
:0 h
/dev/null |
Then if you would send a message
From foo Fooof
To: bar
Subject foobar
body-something-here |
The log file will tell you what happened.
procmail: [20175] Fri Sep 26 10:25:23 1997
procmail: Score: 3 3 "foo"
procmail: Score: -6 -3 "bar"
procmail: Assigning "a=-3"
procmail: Score: 3 3 "foo"
procmail: Score: -6 -3 "bar"
procmail: Assigning "b=0"
procmail: Score: 3 3 "foo"
procmail: Score: -6 -3 "bar"
procmail: Assigning "c=-3"
procmail: Assigning "LASTFOLDER=/dev/null"
procmail: Opening "/dev/null"
From foo Fooof
Folder: /dev/null 46 |
10.3 Score's scope
If you have a delivering recipe and the score is positive, the
action lines are executed. If the score is less or equal to 0, then
the $= information is lost, but also at the next recipe
definition, even if the recipe is never executed. Study following
example:
:0
* 10^0
{
dummy = "Score for condition xxxx was: $= $NL"
:0
{
dummy = "Next recipe, Score no longer available: $= $NL"
}
}
# Wont' work. $= is getting set back to 0 outside of
# the delivering recipe.
dummy = "Score outside of all recipes: $= $NL" |
Here is interesting anomaly which [richard] discovered. It is
presented here only as a curiosity. DO NOT USE IT IN YOUR RECIPES.
(this not "clean programming", but a hack)
[david] If you want to save the score for later use (even if it is
zero or negative):
:0
* 10^0
{ } # procmail no-op
SCORE = $=
:0 A
action_if_positive |
If other recipes that clobber the references for the A flag
intervene, this will work:
:0
* 10^0
{ } # procmail no-op
SCORE = $=
... more stuff ...
:0
*$ $SCORE^0
action_if_positive |
10.4 Counting length of a string
Supposing VAR contains some text, we can count the characters
by using dot to match every character and increasing score for
every match.
:0
* 1^1 VAR ?? .
{ }
LENGTH = $= |
10.5 Counting lines in a message (Adding Lines: header)
[1995-10-03 PM-L Idea by David Karr <dkarr A T nmo.gtegsc.com>] [david]
later corrected 1998-01-02: For one thing, the second condition
always counts one too many (the final newline plus the closing
putative newline create the extra match); second, after making that
correction, an empty body would score zero and leave the variable
undefined.
:0
* 1^1 .
* 1^1 ^.*$
* -1^0
{ }
lines = $=
:0 fhw
* ! ^Lines:
| $FORMAIL -a "Lines: $lines" |
The reason we used it at all was that size conditions worked only on
the entire text regardless of H or B or HB flags at the top of the
recipe. Nowadays we can do this and get the accurate figure in one
condition:
# leave `B ??' out to measure the entire message
:0
* 1^1 B ?? > 1
{ }
size = $= |
If you want to be silly about it (as some of us very often do),
:0
* -1^1 B ?? > -1
{ }
size = $= |
gives the same result, and as long as the search area is non-empty,
so do these, which are even sillier:
:0
* 1^-1 B ?? < 1
{ }
size = $=
:0
* -1^-1 B ?? < -1
{ }
size = $= |
[Karr] This recipe counts bytes in the message, you could use this
Content-length replacement, prefer using the next recipe. The first
score counts every character, and the second score sums up every
line (that is: newlines are added).
:0 H # use B to measure body only
* 1^1 B ?? .
* 1^1 B ?? ^.*$
{
textsize = $=
:0 fhw
* ! ^Content-length:
| $FORMAIL -a "Content-length: $textsize"
} |
10.6 Determining if body is longer than header
:0
* 1^1 B ?? > 1
* -1^1 H ?? > 1
{
..body was longer
} |
10.7 Matching last Received header
[david] Here is way to use scores to hit the bottommost Received
header.
:0
*$ 1^1 ^Received:.*by$s+\/.*
action |
10.8 Testing value range with scoring (bogofilter)
Bogofilter adds headers to the message that contains the
propbability scode of the message being spam in range 0.0 - 1.0:
X-Bogosity: No, tests=bogofilter, spamicity=0.365761 ... |
If the filter runs at MTA, the values that affects the word "No" at
canoot necessarily be configured. To test directly the result score
to catch messages in range 0.2 - 0.9 as "Unsure" can be done with
scoring. If the spamicity value was 0.92, the first score would
return: 1.90 - 0.92 = 0.98, which is lower than 1 the score OK
value.
:0
* ^X-Bogosity:.*spamicity=\/0\.[0-9][0-9][0-9]
{
# check for maximum
:0
* $ -$MATCH^0
* 1.90^0
{
# check for minimum
:0:
* $ $MATCH^0
* 0.8^0
{
# VAlue is betweeb A .. B
}
} |
10.9 How to add Content-Length header
We use procmail for local delivery, and would like to get it
to generate the content-length header, if one doesn't exist. SUN-OS
mailtool at least gets confused and merges messages together if
there is no message body.
[stephen] All you need to do is: a) Make sure that procmail is started
without the -Y flag. b) Either, in your sendmail.cf, insert:
H?l?Content-Length: 0000000000 |
Or (slightly less efficient), insert the following recipe in your
/etc/procmailrc file and Procmail will take care of any necessary
magic.
:0 hfw
* !^Content-Length:
| /usr/bin/formail -a "Content-Length: 0000000000" |
10.10 Testing message size or number of lines
Size conditions ignore H and B on the flag line and always work on
HB unless another search area is specified on the condition's own
line. To test only the body,
:0 # Note: this is in BYTES
*$ B ?? < $NBR
{
...whatever when fewer bytes
} |
This syntax would obey a B flag on the flag line:
:0 # Note: this counts LINES
* -1^1 B ?? .
* -1^1 B ?? ^.*$
*$ $NBR^0
{
...whatever when fewer lines
} |
10.11 Counting commas with recursive includerc
[jari] Foreword: David and Phil really are experts with procmail,
and let this section serve as an example to "what on Earth is
recursive procmailrc and how it is used?". I would not personally
use recursive includerc, simply because I would not trade clarity:
I find this easier to understand and maintain. split just
explodes input according to comma and the print return how many
elements were exploded to array a. The performance hit is not
bigger than forked procmail binaries in recursive version.
:0
* ^CC:\/.*
{
field = $MATCH
saved = $SHELLMETAS
SHELLMETAS
commaCount = `echo $field | awk '{print split($0,a,",")}' `
SHELLMETAS = $saved
} |
[richard] Here is recipe that needs no recursion. MAX_RECIP
is set to 9, but you may prefer some other value. This counts each
comma. It allowed in addresses.Some folks sum Resent-xx or
non-Resent-xx headers. I sum all.
:0
* 1^1 ^(resent|apparently-)?(to|b?cc):\/.*
* 1^1 MATCH ??,
*$ -$MAX_RECIP^0
{
:0
*$ $=^0
*$ $MAX_RECIP^0
{
RESULT = "Count of commas is $="
}
} |
11.0 Formail usage
11.1 Fetching fields with formail -x
If you're new to procmail your first though to read a header
content from the message would might be call:
SUBJECT = `$FORMAIL -xSubject:` |
That's not good. DON'T Do THAT. You just created expensive shell
subprocess where procmail calls formail and feeds full message to
it. We can do the same with minimum efforts:
:0
* ^Subject:\/.*
{
SUBJECT = $MATCH
} |
No shell subprocess called. This is much faster and consumes
fewer resources, while it may need more typing. Use it and
your your sysadm is happy with your well behaving procmail
recipes that don't load the CPU unnecessarily. The equivalent
with formail might be more secure, because it contains full
RFC-compliant parser. The traditional way of deriving the
address with formail is:
FROM = `$FORMAIL -rtzxFrom:` |
But you can still make this more efficient. Here is one example where
you actually want to use "old" =| style variable assignment,
make sure there are no extra spaces:
:0 hw
FROM=|$FORMAIL -rtzxFrom: |
That way only the header gets fed into formail, whereas the
previous back tick fed the whole message. Another benefit is,
that you can then check the return code of formail with a or
A recipe after this one.
11.2 Always use formail's -rt switch
[Philip] As of version 3.14 you should now usually leave out the
-t. To quote the formail manpage:
By default, when generating an auto-reply header procmail selects
the envelope sender from the input message. This is correct for
vacation messages and other automatic replies regarding the
routing or delivery of the original message. If the sender is
expecting a reply or the reply is being generated in response to
the contents of the original message then the -t option should be
used.
11.2.1 For procmail versions prior 3.14
[FAQ] -r breaks RFC822, so always use -rt if you don't know
what this means. Perhaps you should always use it anyway.
[david] There is formail -rt rank bar graph in the source code of
3.11pre4. It might be easier to follow as a top-to-bottom listing
(and again, Tom Zeltwanger appears to be using one of the older
versions where From_ was mistakenly over promoted). These are the
rankings in version 3.11pre4:
formail -r: formail -rt:
Resent-Reply-To: Resent-Reply-To:
Resent-Sender: Resent-From:
Resent-From: Resent-Sender:
Return-Receipt-To: Reply-To:
Errors-To: From:
Reply-To: Sender:
Sender: Return-Receipt-To:
From_ Errors-To:
Return-Path: Return-Path:
Path: From_
From: Path: |
[Stephane Bortzmeyer <bortzmeyer A T pasteur.fr>] Always use -rt and
never -r. Because such precedence (Sender over From) is an
important violation of RFC 822. There is one canonical order,
described in the RFC and nothing else should be used, like fuzzy
ranking or, worse, reordering. This is a serious problem with
formail.
The proper order is:
Reply-To, else From, else Sender, else <error> |
And, how would you deal with resent mail?? Ie: Resent-Reply-To,
Resent-From, and Resent-Sender?
It treats Resent-X as X (" Whenever the string Resent- begins a
field name, the field has the same semantics as a field whose name
does not have the prefix. "). So you have to choose an order between
them, the RFC does not specify it.
[david] I think that the idea is that -r is intended to determine
the origination address, not the place to reply; -rt is for
determining the place to send replies. For addressing a response,
yes, -rt will invert the header in a way more in line with the
rules; for figuring out the origination point,
might be better than
And here's an additional problem: formail -rD always uses the
-r precedences; you can't make it use the -rt precedences
and the -D cache checking function at the same time.
4.4.4. AUTOMATIC USE OF FROM / SENDER / REPLY-TO (RFC 822 excerpt)
For systems which automatically generate address lists for
replies to messages, the following recommendations are made:
- The Sender field mailbox should be sent notices of
any problems in transport or delivery of the original
messages. If there is no Sender field, then the
From field mailbox should be used.
- The Sender field mailbox should NEVER be used
automatically, in a recipient's reply message.
- If the Reply-To field exists, then the reply should
go to the addresses indicated in that field and not to
the address(es) indicated in the From field.
- If there is a "From" field, but no Reply-To field,
the reply should be sent to the address(es) indicated
in the From field.
Sometimes, a recipient may actually wish to communicate with the
person that initiated the message transfer. In such cases, it is
reasonable to use the Sender address.
This recommendation is intended only for automated use of
originator-fields and is not intended to suggest that replies may
not also be sent to other recipients of messages. It is up to the
respective mail-handling programs to decide what additional
facilities will be provided.
11.3 Using -rt and rewriting the From address
Sendmail adds the From header which points to your account. But in
some cases you may wish to rewrite the From.
- You respond to spammer and you want to hide in some extents your
address. ( The headers will still be there, but at least
hitting r in most MUA's pick up the From )
- You want to rewrite From to show your virtual address
me@forever-lasting-address.com instead.
- You are in some other account currently, but you want to send
message to some Net service (e.g Mailing list) that expects to
see the same address you first time used in subscription.
You could also use Reply-To to signify where you want further
responses to go, but that doesn't hide your true From address. And
there are still MUAs that don't obey Reply-to. Whatever reason
you have to rewrite From header, here is the command.
:0 fhw
| $FORMAIL -rt -I "From: me@forever-lasting-address.example.com" |
11.4 Formail -rt and Resent-From header
Here is something that made me scratch my head a lot. Let's examine
scenario first which explains how the mail travels.
account --> virtual-address --> Local-address |
In this chain I was sending message from my one account to
another address, the virtual-address delivers
the mail to right local domain. There is only one problem with this
picture. When a response is generated from Local-address with
formail -rt, the generated address pointed back to
virtual-address, which pointed back to Local-address of
course. A loop back was ready, you could not get the route to travel to
original address: account
What was happening here was that the mail server that handled the
virtual-address, didn't forward the message, but instead
resent the message. In this process a set of new headers were
generated:
Resent-From: <virtual-address>
X-From-Line: <account>
Received: from <the virtual-address mailserver>
Resent-Message-Id: <199710151903.WAA28670@virtual-address>
Resent-Date: <date>
Resent-To: <local-address>
Received: ...<account domain>
Message-Id: <199710151904.WAA05050@account-domain>
From: <account-domain> |
And now when the formail -rt command was used, it picked up the
Resent-From added destination where the message should be returned.
Surprising, but according to procmail, 100% correct. Resent-From
has higher priority than From.
The Resent-* headers are considered informative, and should never
be used when automatically generating a response. The problem here
is the middleman, it should not resend a message, but rather
forward it. So I put this into my .procmailrc to handle the broken
middleman in our site.
# Remove that misleading Resent-From if it was added by our
# "middleman"
:0 fhw
* Resent-From: <our-domain>
| $FORMAIL -IResent-From: |
[edward] adds to this that: As you know, formail -rt is
for composing a response to the address from which an e-mail was
sent. Let's say you are on vacation and have set up a procmail
recipe to auto respond to all e-mail you receive. Furthermore, let's
say Joe sends me an e-mail and I re-send it to you. If you wanted
to respond to the sender of the e-mail that you received, would you
e-mail me or Joe? You better e-mail me because I was the one who
sent it to you. Joe may not even know you. Imagine if you did send
your response to Joe. It would probably cause him considerable
confusion as to why you are sending him e-mail informing him that
you are vacation.
formail -rt uses a heuristic algorithm to determine who it should
respond to, based on the presence of various headers and their
contents. If you look at the formail.c source code, you'll see a
graphical representation of this algorithm. It will also explain
difference between the results of -r and -rt.
Resent-Reply-To has the highest relative importance/reliability of
all header fields. Next is Resent-From and Resent-Sender, followed
by Reply-To, From, Sender, et al.
11.5 Quoting the message
Use formail -rtk
11.6 Without quoting the message
Use formail -rkb or formail -rkt -p ''
11.7 How to include headers and body to the reply message
The idea is that you first capture whole header in a
variable, then add it to the body of message. Here a
custom message is added to the beginning and the headers next.
Notice that the orginal body is already added by rtk. Be
sure to have that space inside braces; they are important.
#todo:
:0
* ^^\/(.+$)+$
{
header="$MATCH"
}
:0 fhw
| $FORMAIL -rt; ... now generate reply ... |
11.8 Adding text to the beginning of message
We don't actually filter anything here. It's just a trick to
reprint headers and add some text after them: text appears
at the beginning of body.
:0 fhw
| cat - ; echo "This text comes after the headers." |
11.9 Adding text to the end of message
:0 fb
| cat -; echo "added text after body" |
11.10 Adding text before quoted message
If you are generating an auto-reply message where you want to place
the notification to the beginning of body followed by the quoted
original message, here is recipe for it. Substitute condition to
trigger the reply condition.
:0
* condition
{
:0 fhb
| $FORMAIL -rtk -p '>' \
-I "From: me@example.com" \
-I "$MYXLOOP"
:0 fhw
| cat -; echo "added message at the start of body"
} |
12.10 How to truncate headers (save filing space)
[Idea by Rodger Anderson <rodger A T hpbs2245.boi.hp.com>] As a last
recipe, if you're tight of space, you could remove extraneous
headers. But make sure you want to that, because headers may
contain useful information about URLs and other things like mail
server addresses. Some people keep signature information in
separate X-header (say: X-My-Info) instead of at bottom of
message so that it won't bother people and disturb reply quoting.
# Strip header to bare minimum
# If this is MIME multipart, then skip recipe
:0 fhw
* ! multipart
| $FORMAIL -k \
-X Date: \
-X Subject: \
-X Message-Id: \
-X From \
-X To: \
-X Cc: \
-X Reply-To: \
-X Mime-Version: \
-X Content-type:
:0 :
mail.default.mbox |
[david] comments the final recipe
- You should keep the Reply-To header if there is one. If the
sender wanted replies directed to a different address than that
in the From header, you are losing that information and, when
you respond, writing to the wrong place.
- You ought to keep To and Cc so that you can tell when you read
your mail who else was sent it. If your mail user agent has a
group-reply or reply-all function, keeping To and Cc will allow
that feature to continue working. This way you are cheating
yourself out of it.
- '-X From' is enough to keep both the From_ line and the From
header. You don't need to specify -X From: again after it.
(To keep From_ without From: you need to say -X "From " or
something similar, with a quoted space.)
- All mail is going to have a line (usually two) beginning
'From'.
Another slightly different approach is to kill the headers that
take the most of the space. If you're not interested in tracking
down the original sender of possible UBE message, then you can
remove the Received headers. You may want to fill out the
condition line to simplify only your work or campus messages,
and let other messages retain their full headers.
:0 fhw
* possible-condition-to-handle-only-certain-messages
| $FORMAIL -I Received: |
11.11 Adding extra headers from file
[stephen] Notice that the obvious solution won't do here:
:0 fhw
* condition
| $FORMAIL -rt | cat - $HOME/newHeaders |
The problem here is that there will be a newline in the middle, which
causes the header to be shortened (procmail determines the new
header/body boundary after having processed each filter). Use the
following instead.
:0 fhw
* condition
| $FORMAIL -rt -X "" ; cat $HOME/pm-newHeaders.txt ; echo |
[david] If $HOME/newHeaders ends in a blank line, you don't
need the "; echo". Under some circumstances procmail puts back the
blank separating line if it gets lost, but I'm not sure exactly what
those are, and you have a SHELLMETAS character in there already (the
first semicolon), so a shell is forked anyway.
But this is my favorite way (it assumes that formail -r will never
generate a continuation line for From:); if you use it, make sure
that the newHeaders file does NOT contain a trailing blank line:
:0 fhw
* whatever
| $FORMAIL -rtn
:0 A fhw
| sed "/^From:/r $HOME/newHeaders" |
11.12 Splitting digest
[Idea by David Hunt] One interesting idea to handle digests
automatically as single messages if that we call procmail
recursively. First Call formail to split the mail when headerfields
are contained in the body, calling procmail again as the
output-program of formail. Insertion of X-Loop makes it possible to
reuse ~/.procmailrc for the separate messages.
# If it more than one mail, send to formail for
# splitting, then send back to procmail for sorting again.
:0
* B ?? ^From [-_+.@a-z0-9]+ (Sun|Mon|Tue|Wed|Thu|Fri|Sat)
* B ?? ^From:
* B ?? ^TO
*$ ! H ?? ^$MYXLOOP
| $FORMAIL -A "$MYXLOOP" -m4s procmail |
11.13 Mailbox: Splitting to individual files
[david] To split some old mail archives into individual
files while stripping unimportant header fields, use following. The
keys are to use procmail's -p option, to strong-quote $FILENO in
the setting of DEFAULT, and to use /dev/null or a known empty file
as the rcfile.
% setenv FILENO 0000
% formail -kXDate: -XFrom: -XTo: -XSubject: -XIn-Reply-To: \
-XX-Mailer +1ds \
procmail -p DEFAULT=`pwd`/'$FILENO.txt' \
/dev/null < inputfile |
11.14 Mailbox: Extracting all From addresses from mailbox
The -ns option causes formail to split the mailbox and feed each
mail separately to next process.
% formail -ns formail -xFrom: < mailbox | sort -u |
11.15 Mailbox: Applying procmail recipe on whole mailbox
% formail -ns procmail pm-experiments.rc < mailbox |
11.16 Mailbox: run series of commands for each mail (split mailbox)
...Maybe the heat has melted my brain, but I can't seem to get
formail to perform a series of commands on each mail that it has
split from a folder. Here's an example of a simple debugging
attempt: I've tried parentheses, putting the commands into a
shell function, and other flailings too numerous to remember, all
to naught.
% formail -s addr=`formail -XFrom: | formail -r | formail -zx To`;\
echo "$addr" >>output |
It appears that formail doesn't use the shell when executing the
command specified when splitting. No SHELLMETAS here. Given that, the
secret is to fire up the shell explicitly yourself to do the piping:
% formail -s sh -c 'formail -XFrom: | formail -rzxTo:' >> output |
Note that you only need two formails in the pipe, not three, as the -r
flag works correctly when combined with other flags.
...To me, a large mailbox would consists of about 10,000 messages
per month (that's about what I get). That would mean that my
mailbox would contain 60,000 messages in 6 months. I sure as heck
wouldn't want to skim through it all or even try to load it up in
an MUA.
[1998-08-27 Bennett Todd <bet A T mordor.net>] I also deal with monster
volumes of mail. I've switched over entirely to Maildir in all my
mail handling; the only place I still see mboxes is in the save
folders of my netnews reading (using slrn) and whenever I want to
process them I either convert them into Maildir (e.g. for archival)
or simply split them into multiple messages. Splitting into
multiple messages turns out to be preposterously easy; using GNU
csplit:
[richard] The csplit invocation shown here will catch
occurences of ^From embedded in the message body if your MUA
hasn't escaped them with a >. Some MUAs use content-length
headers and don't escape ^From. Procmail supports this. Be cautious
if you choose to use this simple split.
csplit -n4 - '/^From /' '{*}' |
That will create an empty xx0000 which I delete, and leave the
messages in files named xx0001, xx0002, etc. If you have more than
9999 messages in a folder then go -n6, or -n9, or whatever. Once
they're split it's really easy to use shell tools to bundle
messages into batches, file them into categories, etc.
If you are archiving all mail traffic forever (which I do) then
another dandy tool to add to the mix is glimpse
http://glimpse.cs.arizona.edu/ it takes a while to build the index,
but that's a fine job to run out of cron at night. Once the index
is built it's a pleasingly quick way to root through big archives
of messages.
11.17 Option -D and cache
[Bob Weissman <b_weissm A T kla.com>] and [stephen] These files are
self-limiting. The number after the -D is the size in bytes above
which the older entries will be removed. E.g., my .procmailrc has
:0 Wh: .msgid.cache$LOCKEXT
|$FORMAIL -Y -D 12288 .msgid.cache |
And the file never exceeds 12288 bytes by very much. Though
formail indeed exceeds this size by as much as the length of one
message-ID, the file size should never grow significantly beyond
that, even if used indefinitely. The file is in binary format, each
entry terminated by single null byte, and an occasional (significant
placeholder) double null
[philip] The format of the cache is initially as follows:
When the file size grows to equal-to or greater-than the size
specified on the command line, formail starts over at the
beginning, using a double-null to mark where it stopped. However,
entries after the double-null, except for the partially overwritten
one, are still valid and checked, so that the file is then in the
format:
entry\0entry\0entry\0\0partial-entry\0entry\0entry\0\0 |
New entries will be written after the first double-null, so that it
implements a circular cache. Check out lines 319-322 of formail.c
11.18 Option -D and message-id in the body
Some of my messages contain the original Message-ID in the body
of the letter and not the Header. Is there an option for Formail to
over come this problem?
[david] This is strictly untested; I don't know where in the
body the Message-ID's appear, but if they're at the top of the body,
this might help:
:0 hW # Message-Id: in the head,
*$ ^Message-Id:.*$NSPC
| $FORMAIL -D $cache_size $cache_name
:0 E bW # If not but there's one the body, try body.
*$ B ^Message-Id:.*$NSPC
| $FORMAIL -D $cache_size $cache_name |
You might want to copy a Message-Id from the body to the head in
any case (if there's none already in the head) just to have it in
the right place, so we could do that first and then formail -D
will work normally. This form will run formail twice if the
Message-Id header is in the body instead of the head, but it will
look for Message-Id on any line of the body, not just at the
top:
:0 fhw
*$ ! H ?? ^Message-Id:.*$NSPC
*$ B ?? ^\/Message-Id:.*$NSPC
| $FORMAIL -A "$MATCH"
:0 hW
| $FORMAIL -D $cache_size $cache_name |
11.19 Reducing formail calls (conditionally adding fields)
#todo: url
Suppose you want add fields to the message when some condition is met:
:0 # compose initial reply
| $FORMAIL -rt
:0
* condition1
| $FORMAIL -A "X-Header1: value1"
:0
* condition2
| $FORMAIL -A "X-Header2: value2" |
Hm, we have three processes called here, can we minimize the calls?
Yes, this is idea from [philip] and [david]. Notice that there is
only ONE process needed.
:0
* condition1
{
hdr1 = "-AX-Header1:value"
}
:0
* condition2
{
hdr2 = "-AX-Header2: value"
}
:0 fhw
| $FORMAIL -rt ${hdr1+"$hdr1"} ${hdr2+"$hdr2"} |
And if you want to stack all headers to only one variable, it is
a bit of extra work. Below we use short variable names only because
of the line space: the calls fit on one line.
- field = all (f)ields stacked to one string.
- nl = continuation newline terminator of previous field
The recipe says: if field has previous value, set nl to newline
separator, later concat previous contents of field with possible
newline and new header field.
field # kill variable
:0
{
nl
nl = ${field+"$NL"}
field = "$field${nl}X-Header1: value"
}
:0
{
nl
nl = ${field+"$NL"}
field = "$field${nl}X-Header2: value"
}
:0 fhw # If we have something in *field*
* ! field ?? ^^^^
| $FORMAIL ${field+-A"$f"} |
The above recipe was the most general one, each recipe determined
by itself if the f existed previously or not. But if you know
that f is already set, you can write simpler recipe:
:0 # We know f has value before our module
{
field = "$field${NL}X-Header1: value"
} |
11.20 Formail -A -a options
You can't use option -A with -a or -I if the header name is the same.
Like below where you try to keep only the last definition of X-1,
but the first -A isn't seen when -a is applied.
formail -A "X-1: 1" -a "X-1: 2"
-->
X-1: 1
X-1: 2 |
Whereas; separate pipes give you the desired results.
formail -A "X-1: 1" | formail -a "X-1: 2"
-->
X-1: 1
formail -A "X-1: 1" | formail -I "X-1: 2"
-->
X-1: 2 |
11.21 Formail -e -s options
[david] I had a file of alternating From and Date lines and
wanted to convert it into an mbox.
formail -dem2 -s < input > mailbox |
should have done it, right? Nope; formail -s took it all as one
message, even with -m1. When I edited in blank lines, the command
worked. My first reaction was that the -e option wasn't working as
advertised and that the blank lines were necessary after all.
Then I realized the real problem: there was no interruption in the
succession of valid header lines in the input for anything that
could look like a body. I could have put something other than blank
lines between each pair of header fields and then -e would have done
its job, but as long as every additional line looked like a valid
RFC822 header field, even if its name was the same as one that had
appeared earlier, formail -s assumed that it was still the same
message's head.
12.0 Saving mailing list messages
12.1 Using subroutine pm-jalist.rc to detect mailing lists
Because I didn't have sendmail plus addressing capabilities
(explained in next section) I wrote module pm-jalist.rc. It
is included in the pm-code.zip
The subroutine tries to detect and derive the mailing list name
directly from the message. Many Mailing daemons: ezlm, smarlist,
listserv, majordomo use standardized headers from where the list name
can be picked. After this subroutine has been applied to message,
the variable LIST contains the mailing list name. You no longer
have to manually insert separate recipes for each new mailing list
you subscribe to, because this subroutine adaptively finds new new
mailing lists.
Once the mailing list name has been grabbed, you can easily "map"
or convert the name to any suitable folder name before saving it:
LIST LIST name Description of mailing list
(as grabbed) you want
--------------------------------------------------------------
jde java.jde Java Development Env
java java.prog Java programming
FLAMENCO flamenco Flamenco music
tango-l tango Argentine Tango dancing
tm-en-help tm-en Emacs TM mime package mailing list
w3-beta w3 Emacs WWW mailing list |
You set then conver grabbed LIST to new folder name with
conversion table:
JA_LIST_CONVERSION = "\
jde java.jde,\
java java.prog,\
FLAMENCO flamenco,\
" |
And to detect all mailing lists, you only need one recipe, like
below:
INCLUDERC = $PMSRC/pm-jalist.rc
:0 : # if list name was grabbed
* ! LIST ?? ^^^^
$LIST_SPOOL_DIR/list.$LIST |
12.2 Using plus addressing foo+bar@address.com
If you have a recent enough (8.8.8+) sendmail, please ask your
sysadm to activate the plus addressing. Procmail gets bar in $1
automatically.
http://www.faqs.org/faqs/mail/addressing/
[Bennett Todd <bet A T mordor.net>] The PLUS feature has also been
Implemented in qmail and Postfix (nee VMailer). By default
qmail uses "-" rather than "+", but it can be configured to use
different rules; Postfix doesn't come with either enabled, but its
example main.cf has a commented-out line to enable "+"-based
support.
[Roy S. Rapoport <rsr A T macromedia.com>] Plus addressing is
implemented using sendmail (well, I'm sure the other MTAs can also
do it, but my experience is with sendmail). The last few releases
of sendmail (8.8.6, 8.8.7, 8.8.8) all seem to automatically default
to allowing it. Basically, for any address of the form foo+baz,
sendmail ignores the +baz part and just delivers it to foo.
If you want the easiest method to handle mailing list mails, then
subscribe to list by using dedicated plus address:
login+list.procmail@example.com
login+list.debian@example.com
login+list.linux@example.com |
When you receive message from any of these mailing lists to your
login account, the list.procmail is already in variable $1 and
the recipe to sink all mailing lists to their individual folders is
very simple:
# Note: The $1 contains value only _IF_ procmail
# is invoked with option -m or -a (with an argument).
# Be sure procmail is invoked with that oprion either as from
# LDA or ~/.forward.
#
# $1 is pseudo variable and it can't be used in condition line,
# so we copy the value to ARG.
ARG = $1
:0 :
* ARG ?? list
$ARG |
[david] Here is what I have configured to sendmail.cf to support
plus addressing:
Mprocmail, P=/usr/bin/procmail, F=DFMmShu, \
S=11/31, R=21/31, \
T=DNS/RFC822/X-Unix, \
A=procmail -m $h $f $u |
Well, this is definition of the procmail mailer, not the local
mailer. Furthermore, there's more to plus-addressing support than
the definition of the local mailer. Ruleset 0 or 5 needs to be set
up to move everything after the + into the 'host' variable ($h).
Unless you have a strong understanding of sendmail rule sets and
rewriting rules, you should not attempt to add plus-addressing to
your sendmail.cf, but instead just install the latest version of
sendmail and use the m4 sendmail.cf generation tools with a .mc
file that contains:
FEATURE(local_procmail, `/usr/local/bin/procmail') |
plus whatever else your site requires.
...Ok, I corrected it. Well, here's what that looks like. I did
look into the part about Ruleset 5 while trying it on
originally. But all I could do was make sure that the
plus-addressing section was there.
Mlocal, P=/usr/bin/procmail, \
F=lsDFMAw5:/|@qSPfhn9, S=10/30,
R/40,
T=DNS/RFC822/X-Unix,
A=procmail -Y -a $h -d $u
Mprog, P=/bin/sh, F=lsDFMoqeu9, S=10/30, R/40, D=$z:/,
T=X-Unix,
A=sh -c $u |
12.3 Using RFC comment trick for additional information
Recall from [rfc1036] that the preferred Usenet mail address
formats are following
From: login@example.com
From: login@example.com (First Surname)
From: First Surname login@example.com |
I invented this idea after reading Eli's excellent FAQ about mail
addressing. Please read it (especially section 19.) before you
continue in order to understand what I'm going to present.
I have an account which does not support plus addressing and I was
kinda jealous to everyone that could use this neat sendmail
addressing scheme. The plus addressing helps so much better to deal
with mailing list messages.
But as it turns out, we can simulate in some extent plus addressing
with pure RFC compliant address. We exploit RFC comment syntax,
where comment is any text inside parentheses. According to Eli's
paper, comments should be preserved during transit. They may not
appear in the exact place where originally put, but that shouldn't
be a problem. So, we send out message with following From or
Reply-To line:
first.surname@domain (First Surname+list.procmail) |
Now, when someone replies to you, the MUA usually copies that
address as is and you can read in the receiving end the PLUS
information and drop the mail to appropriate folder: mail.procmail.
[About subscribing to mailing lists with RFC comment-plus address]
It's very unfortunate that when you subscribe to lists, the comment
is not preserved when you're added to the list database. Only the
address part is preserved. I even put the comment inside angles to
fool program to pick up everything between angles.
first.surname(+list.procmail)@example.com |
But I had no luck. They have too good RFC parsers, which throw away
and clean comments like this. Eg. procmail based mailing lists, the
famous Smartlist, use formail to derive the return address and
formail does not preserve comments. The above gets truncated to
first.surname@example.com |
Also many mailing lists send out messages as Bcc, so your address
is not even available in headers anywhere, neither is this nice RFC
comment. Ah well, but this RFC comment trick works very well in
private communication, virtually all MUAs copy whole contents of a
From or Reply-To header to To header, preserving comments and
you get the benefit of plus addressing. Here is procmail code
to demonstrate reading the PLUS information from RFC comment-plus
field:
RC_EMAIL = $PMSRC/pm-jaaddr.rc # Address explode module
:0
*$ To:\/.*
{
INPUT = $MATCH
INCLUDERC = $RC_EMAIL # Explore grabbed To address
# If COMMENT_PLUS was defined, module found "+"
# address which contained, say, "mail.procmail".
# Save it to folder.
:0 :
* COMMENT_PLUS ?? [a-z]
$COMMENT_PLUS
} |
Pretty simple. And you can put anything inside RFC comment and do
whatever you want with these plus addresses. NOTE: there are no
guarantees that the RFC comment is preserved every time. Well, the
standard RFC822 says is must be passed untouched, but I'd say it is
90% of the cases where mail is delivered from one server to
another, it is kept.
Example: if you discuss in Usenet groups, you could use address
first.surname@example.com (First Surname+Usenet.default)
first.surname@example.com (First Surname+Usenet.games)
first.surname@example.com (First Surname+Usenet.emacs)
first.surname@example.com (First Surname+Usenet.linux) |
12.4 Simple mailing list handling
[Peter S Galbraith <galbraith A T mixing.qc.dfo.ca>] I have used
this in the past (by simply looking at the spool file and seeing the
From_ line of the message):
:0 :
* ^From debian
list.debian.mbox
:0 :
* ^From procmail
list.procmail.mbox |
Now, I collect specific high-volume mailing lists (like Debian) into
their own spool files like above, and let other recipes catch all
other mailing lists (like procmail and fvwm) into a single spool
file with later rules:
:0 : # Majordomo lists
* ^Sender: owner-\/[-a-zA-Z0-9_.]*
list.$MATCH.mbox
:0 :
* ^X-Mailing-List: <\/[-a-zA-Z0-9_.]* # SmartList lists
list.$MATCH.mbox |
So Debian mailing list mail goes to Debian, procmail and fvwm mail
go to mail lists and mail addressed to me yet CC'ed to a list go to
my main spool file.
12.5 Archiving according to TO
Traditional way to detect and save mailing list messages is:
:0 :
* ^TO()procmail
list.procmail
[and so on...] |
The following code will save the message to folders list.foo, list.bar,
list.procmail when the name is in the TO address.
# generalised version By David W. Tamkin. Cases desired
# for foldernames
LISTS = "(foo|bar|procmail)"
:0:
*$ ^TO_()\/$LISTS
*$ LISTS ?? ()\/$\MATCH
list.$MATCH |
12.6 Using Return-Path to detect mailing lists
[philip] For most mailing lists, a more accurate way to determine
whether it came from the list is to examine the Return-Path:, From_
or Resent-From: header. This catches messages from the list,
regardless of whether they were To: the list, Cc: the list, or even
Bcc: the list, something which doesn't show in the message at all.
For instance, I refile message from the procmail mailing list using
the following recipe:
:0
* ^Return-Path: +<procmail-request@informatik
~/Lists/procmail/. |
There's one tricky thing to note: if someone sends a message to
both me and the list (say, responding to a message I
sent to the list), then the copy that got to me through the list
will end up in my procmail folder, while the copy that went
directly won't. I like this behavior, but some people, possibly
yourself, may prefer it if both messages end up re-filed. If so,
your best bet is to combine the above with matching against the To:
and Cc: headers via the ^TO_ token:
:0
* ^Return-Path: +<procmail-request@informatik|\
^TO()_procmail@informatik
~/Lists/procmail/. |
(If you have a version of procmail before 3.11pre4, then you'll
need to use "^TOprocmail" instead of "^TO_procmail".). If you're
subscribed to many mailing lists, here is one general recipe
Notice: you don't want to include < in the recipe like:
^TO_\<\/$LISTS because The ^TO_ token contains something similar to
\< but better, so that the \< can only cause problems. A trailing
\> is not a bad idea, though because it's not a zero-width
assertion but rather an actual character class, you have to strip
it from the match
LISTS = "(foo-list|bar-list)"
# 1) to get the match
# 2) rematch sans the trailing \>
# 3) Note: preserves capitalization of the string
:0
*$ ^TO_()\/$LISTS\>
*$ MATCH ?? \/$LISTS
*$ LISTS ?? ()\/$\MATCH
{
M = $MATCH
<action>
} |
[Era] gives this sample example to describe what happens above:
VAR = "MOO"
what = "(moo|bar|baz)"
:0 # Search what from VAR
*$ VAR ?? ()\/$what
{
# Now; what is was that really matched, there were several
# choices: moo,bar,bar
# Beware: $MATCH must not contain regexp characters
:0
*$ what ?? ()\/$MATCH
{ } # no-op
# Fine, New MATCH contains moo
} |
13.0 Procmail, MIME and HTML
13.1 Mime Bibliography
List of annoying things that various MIME implementations do.
...The result is a sort of style guide for implementors of things that
generate MIME. Feel free to send comments or contributions.
http://www.cs.utk.edu/~moore/mime-style.html
13.2 Mime content type application/ms-tnef
...A member of one of my mailing lists appears to be using
Microsoft Mail. His messages to the list are usually accompanied my
an encoded attachment like this one:
"c:\eudora\users\steven@idma.com\attach\WINMAIL11.DAT" The message
headers include the following clause: Content-Type: multipart/mixed;
boundary="openmail-part-058c9f3d-00000001" This is driving people
crazy. What is causing this and is there any way to make it stop?
Most likely the sender is using Exchange (or Windows Messaging or
Outlook97) and sent the messages in Rich Text Format. It puts the RTF
message in an attachment called WINMAIL.DAT (application/ms-tnef). But
this attachment is useless unless the recipient is also using
Exchange.
The sender can turn off the RTF option for messages to you. For more
information, see: XCLN: Sending Messages In Rich-Text Format
http://support.microsoft.com/support/kb/articles/q136/2/04.asp
13.3 Trapping HTML mime messages
[era] Here's a simple filter to throw out unwanted HTML that is sent
by using mime. [jari] This recipe detects if the message is
classified as mime text/HTML and junks it to separate folder. It
does not change the message content. If you want to actually
remove HTML or other attachments from the message, see
pm-jamime-kill.rc in the module list.
:0:
*$ ^Content-Type:$s*multipart/(mixed|alternative);\
$SPCNL*boundary="?\/[^;"]+
*$ B ?? ^--$\MATCH\$([-a-z]+:.*)*Content-type:$s*text/HTML
junk.html.mbox |
Some more examples can be found from section: 'Explaning ^^ and ^'
13.4 Complaining about HTML messages
[Marek Jedlinski <eristic A T gryzmak.lodz.pdi.net>]. This how I
respond to HTML messages. In my noHTML.txt I politely explain
why I don't appreciate receiving HTML mail, and ask to resend the
message as plain text. What happens in the majority of cases is that
the sender resends the same message again ("oh, it bounced, let's
try again") and I assume they don't actually read my explanation
since they just happily resend the HTML cr*p. It bounces again at
which point they give up... Tough luck, I say ;)
BTW, the above recipe is placed after mailing list mail gets
sorted. When someone sends HTML mail to a mailing list I read, I
just flame them in person
TXT_NO_HTML = $HOME/noHTML.txt
:0
* ! H ?? ^FROM_DAEMON
*$ ! H ?? ^$XLOOP
* HB ?? ^Content.Type.+multipart.alternative
* HB ?? ^Content.Type.+text.html
{
LOG = "$NL --TRASH: multi-part HTML $NL"
:0
| ($FORMAIL \
-rtk \
-A "X-Mailer: Procmail Autoreply" \
-A "$XLOOP" ; \
cat $TXT_NO_HTML \
) | $SENDMAIL
} |
13.5 Converting HTML body to plain text
Note: Older lynx has security holes:
http://ciac.llnl.gov/ciac/bulletins/h-82.shtml
http://lynx.browser.org/
The most popular solution to convert HTML body into plain text is to
use lynx. Another more straightforward method is to use a perl one
liner: it's quicker, easier to use with procmail but it doesn't pretend
to know about HTML DTD. The recipe below should be taken with grains of
salt: seeing HTML tag is no guarantee that the body "only" has HTML. A
cautious recipe writer also watches for MIME multi part messages. (See
pm-jamime.rc to draw some mime characteristics from message)
This recipe has been written so that you can add more alternative
HTML conversion scripts. You may even want to select the appropriate
conversion for a message: e.g perl for unimportant ones.
Note: This is oversimplified method of checking if body contains
HTML. It would be probably a good idea to check mime headers which
indicate HTML encoding here as well.
:0
* B ?? ()<HTML>
* B ?? ()</HTML>
{
conversion = "lynx" # or select this conditionally
:0
* conversion ?? lynx
{
# In new lynx version you can read from stdin. If
# /dev/stdin doesn't exits try /dev/fd/0
#
# lynx -dump -force_HTML -nolist -restrictions=all \
# /dev/stdin
#
# Without a global lock on this, you have a chance
# that two procmail instances will try to write to
# msg.dump
file = "$HOME/tmp/msg.dump"
LOCKFILE = $file$LOCKEXT
:0 fbw
| cat > $file && lynx -dump $file
LOCKFILE
}
:0 E fbw
| perl -0777 -pe 's/<[^>]*>//g'
} |
13.6 Getting rid of unwanted mime attachments (HTML, vcard)
Microsoft and Netscape MUAs are conquering the PC world and it's
likely that you will receive messages from people that use this
software. The unfortunate thing is that you receive the message in
mime format:
HEADERS
--mime-boundary
plain text
--mime-boundary
Some idiotic HTML (or other type) copy of the text
--mime-boundary |
When you would like to see a traditional message in the format:
Good news. There's a procmail module that addresses this problem. The
module can kill any mime attachment and the predefined sets include
typical cases:
- Microsoft Explorer has a bad habit of including 7k
application/ms-tnef attachment to the end of message.
- Lotus Notes sends similar extra attachment.
- Microsoft Express sends a copy of message in HTML format in the
attachment.
- Netscape's Mozilla sends a copy of message in HTML. See
example. It Also sends annoying vcards.
The module is called pm-jamime-kill.rc and included in Jari's
pm-code.zip.
(Note: Procmail module list)
13.7 Sending contents of a HTML page in plain text to someone
[timothy] Send an mail with the subject: "GetPage:
some.url.here/". And it comes back. Kurt Thams <thams A T thams.com>
also pointed out that lynx allows file:// protocol and since
procmail is running as you, this would be a security risk.
We make the script safe here by forcing "http://$MATCH" and not
simply using "$MATCH"
:0
*$ ^Subject:$s+GetPage:()\/.*
*$ ! ^$MYXLOOP
| ($FORMAIL \
-rt \
-I "Precedence: junk" \
-I "Subject: Requested page: $MATCH" \
-I "$MYXLOOP" ; \
lynx -dump "http://$MATCH" \
)| $SENDMAIL |
[era] If all you need is to create a suitable MIME package, there
are various MIME command-line utilities such as metasend (which is
for interactive use, and so doesn't work very well with Procmail)
and mpack you can try. If your needs are simple, you could even
read up a bit on the MIME spec and generate the necessary headers
and separators yourself (echo Content-Type: multipart/mixed etc etc
etc). Conversely, if your needs are complex, get the Perl MIME
package from CPAN and cook up your own tool. The MIME FAQ
(especially part 6) is a good place to look for info.
http://www.faqs.org/faqs/by-newsgroup/comp/comp.mail.mime.html
14.0 Simple recipe examples
14.1 Saving: MH folders – numbered messages
Hm. This is explained in the procmail man pages, but not very
well. There are just one or two occasions where the man page tells
how to create individual files instead of catenating messages
to a folder. Notice the /. at the end of folder name
:0
* condition
dir-folder/. |
[manual] When delivering to directories (or to MH folders) you
don't need to use lockfiles to prevent several concurrently run-
ning procmail programs from messing up.
On a save to a directory, how does procmail determine what to put
after $MSGPREFIX to complete the name of the file?
[philip] It's the inode number of the file encoded in
base-64 with the set of characters A-Za-z0-9-_, in reverse order.
So, for example, the inode numbered 59699 would be encoded as
follows:
59699 = 51 + 64 * ( 36 + 64 * 14 )
A=0, B=1, ..., N=13, O=14, ..., a=26, ..., k=36, ..., z=51,
0=52, ...
--> zkO |
14.2 Saving: to monthly folders
# Use any date method mentioned previously to define variables
# YYYY YY MM DD. Archive digests monthly
:0 c:
* ^From:.*\/mailing-list-digest@example.com
{
# Get the "mailing-list-digest" string, do not use following
#
# MBOX = `echo $MATCH | sed -e 's/@.*//' `
#
# Because we really don't need those extra shell processes.
# Procmail can derive the word 10x more efficiently
:0
* MATCH ?? ()\/[^@]+
{
MBOX = $MATCH
}
:0 :
$YYYY-$MM-$MBOX
} |
14.3 Modifying: Filtering basics
Pay attention to the cat command position in each recipe.
:0 fbw
| echo "This is a line of text _before_ the body"; \
cat -
:0 fbw
| cat - ; \
echo "This is a line of text _after_ the body"
:0 fbw # prepend text before the body
| cat msg.txt -
:0 fbw # append text at the end of body
| cat - msg.txt
:0 fbwi # replace the body with text from file
| cat msg.txt |
14.4 Modifying: Squeezing empty lines around message body
[david] Anything that replaces the body is going to require an outside
process, even if it's only /bin/echo. In order to trim empty lines
from the beginning of message and from the end of message, you can
do this, if the entire body fits into LINEBUF
:0 fbw
* B ?? ^^$*\/.(.|$)*.$
| echo "$MATCH" # trailing extra newline intended |
If your version of cat is BSD-ish,
# SysV's cat has a different meaning for -s and cannot do this
:0 fbw
* B ?? $$$
| cat -s |
otherwise, it can be done with a very simple sed filter:
:0 fbw
* B ?? ^^($)|$$$
| sed /./,/^$/!d |
Note that cat -s has slightly different results from the others: if
there are any empty lines at the top of the body, cat -s will keep
one. The echo and sed suggestion will remove all empty lines from
the top and, like cat -s, keep one at the bottom.
14.5 Modifying: shuffling headers always to same order
[phil] To sort the headers in the message into predictable order,
you can use following recipe. The spaces have been eliminated
between the -I and its argument in the above. The shell may or
may not allow unquoted spaces in the second part of the
${variable:+blah}. For example, under solaris 2.6, /bin/sh barfs on
${FROM:+-I "From: $FROM"}, while /bin/ksh handles it just fine. I
think the POSIX shell standard requires that it be allowed, but,
well, will your next system be POSIX compliant?
:0
* ()\/^From: +\/.*
{
FROM = $MATCH
}
:0
* ()\/^Reply-To: +\/.*
{
RT = $MATCH
}
:0
* ()\/^X-Mailer: +\/.*
{
XM = $MATCH
}
:0
* ()\/^Message-Id: +\/.*
{
MID = $MATCH
}
:0
* ()\/^Date: +\/.*
{
DATE = $MATCH
}
:0
* ()\/^To: +\/.*
{
TT = $MATCH
}
:0
* ()\/^CC: +\/.*
{
CC = $MATCH
}
:0
* ()\/^Subject: +\/.*
{
SUBJ = $MATCH
}
:0 fh w
| $FORMAIL \
${XM:+-I"X-Mailer: $XM"} \
${TT:+-I"To: $TT"} \
${FROM:+-I"From: $FROM"} \
${RT:+-I"Reply-to: $RT"} \
${CC:+-I"Cc: $CC"} \
${MID:+-I"Message-Id: $MID"} \
${DATE:+-I"Date: $DATE"} \
${SUBJ:+-I"Subject: $SUBJ"} |
14.6 Service: Auto answerer to empty messages
[elijah] Here is piece of code that responds to empty
messages.
:0
* ! B ?? ...
| (echo "From: me@example.com" ; \
$FORMAIL -r -A"Precedence: junk" \
-A"X-Loop: me@example.com" ; \
echo "Your blank message was received.\n" \
"Did you mean to say something?\n" \
"\n" \
"-- \n" \
"My Signature\n" \
"this has been an automated response\n" \
) | $SENDMAIL |
14.7 Service: Ping responder
Sometimes I'm on the road and I don't seem to get access to the
site where my messages are. The telnet connection fails and
standard Unix "ping" plays dead for me. "What's happening in that
site?" I wonder. Here is a recipe that I have added to all of my
accounts. It sends an immediate reply if at least the mailhost is
up and gives some status information.
:0
* ^Subject: ping$
{
:0 fh
| $FORMAIL -rt
# Remember, Don't send back anything that would be vital to
# attacker. It doesn't matter if the `uptime` or other
# scripts fail, the reply is sent anyway.
:0 c # Record this ping request
| ( cat -; \
echo `uptime`; \
echo "$HOST User count: " `who | wc -l`; \
) | $SENDMAIL
:0 : # or sink to $DEFAULT
$PING_SPOOL
} |
14.8 Service: simple vacation with procmail
Don't forget to look into procmailex(5) man pages which also has
vacation example. The ones presented below may not work for you.
Here is a very simple vacation recipe. Whenever the file ~/.vac
exists, the vacation program is called. Be sure that you have the
~/.vacation.msg file ready too. Remember that vacation does not
save you messages; so we need c flag here.
# Some prefer the non-dotted file which shows up in ls listing
vacationFlagFile = $HOME/.vac
:0 wc
*$ ? $IS_EXIST $vacationFlagFile
| vacation $LOGNAME |
Some people like to raise a flag in .procmailrc instead of creating a
file. If you like the variable approach better, here is the equivalent
implementation of the above
VACATION = "yes" # Comment this when not in vacation
:0 wc
* VACATION ?? yes
| vacation $LOGNAME |
[philip] and [era] Since vacation only sends replies – it
never sends the original # messages, one way to do two things with
your .forward file. Substitute "abc" with your login name.
|/usr/ucb/vacation","exec /usr/local/bin/procmail -f- ||exit 75 #abc |
14.9 Service: vacation code example
[By Eric Black <eric A T Mirador.COM>] Here is the procmail part
OFFSITE = "my_guest_login@wherever.I.am.example.com"
# Forward urgent mail to me at my off site address; afterward,
# continue processing it as normal The procmail pattern match
# may be case-insensitive, in which case this rule could be
# simplified...
:0 c
* ^Subject: .*urgent
| $SENDMAIL $OFFSITE
# Use "vacation" to tell other people I'm not here To enable,
# un-comment the next two lines; to disable, comment them out
#
# The -a Identifies another name that can legitimately
# appear in the To: line of the mail header instead
# of your login name
:0 wc
| vacation -a ericb eric |
And here the ~/.vacation.msg file
Subject: I'm out of town for a while
From: eric (via the vacation program)
I'm out of town until <return-date>. Your mail regarding
"$SUBJECT"
will be read when I return, or possibly at some unknown
time before then if I get a chance to check for mail.
If your message must be seen by me before I return,
you can send it with the word "URGENT" in the subject header.
Such mail will be automatically forwarded to me so that
I see it sooner.
--Eric |
14.10 Service: Auto-forwarding
[timothy] I have my .procmailrc setup to forward mail to another
(mail only) account. When I am not going to be at the account, I
want to turn forwarding off
# look for the file to tell us whether or not to forward mail
# if the file exists, forward the mail
# or not
ELSWHERE = "me@elsewhere.example.com"
FILE = "$HOME/.forwardmail"
:0 c
*$ ? $IS_EXIST $FILE
! $ELSWHERE
# if a message arrives from the other account
# with the Subject 'forward-off' then remove the
# file, efectively turning off forwarding
:0 hwic
*$ ^From:.*$ELSWHERE
* ^Subject: forward-off
| $NICE mv -f $FILE $FILE.off
# if a message arrives from the other account
# with the Subject 'forward-on' then remove the
# file, efectively turning off forwarding on
:0 hwic
*$ ^From:.*$ELSWHERE
* ^Subject: forward-on
| $NICE mv -f $FILE.off $FILE |
14.11 Service: forward only specific messages
Here is piece of code that triggers forwarding according to
addresses. If you have lot of these kind of forwarding,
you should use simple awk database which you would grep.
# By Jim Hribnak <hribnak A T nucleus.com>
# info@1.example.com goes to joe@1.example.com
# info@2.example.com foes to fred@2.example.com
:0
* ^TO_()info@1.example.com\>
{
FORWARDTO = "$FORWARDTO joe@1.example.com"
}
:0
* ^TO_()info@2.example.com\>
{
FORWARDTO = "$FORWARDTO fred@2.example.com"
}
:0 fhw
* FORWARDTO ?? @
* ! ^$MYXLOOP
| $FORMAIL -A "$MYXLOOP"
:0 a
! $FORWARDTO |
14.12 Service: Making digests
# By <jimo A T eskimo.com>
# Add this message to the digest accumulator
:0 c:
| $FORMAIL -k -X From: -X Message-Id -X Date -X Subject >> $DIGEST
# Check size of digest, and send it off if it's big enough
:0
*$ -$DIGSIZE ^0
*$ `wc -l <$DIGEST` ^0
| $NICE send-digest $DIGEST |
14.13 Kill: killing advertisement headers and footers
A mailing list that I subscribe recently began adding a block of
"boiler plate" text to the beginning and end of every message
that goes through the list (groan). The text is always the same,
and is always at the beginning and end of the message.
[david] sed could do both at once, but the problem is that
sed never knows when it is N lines from the end if N>0; it knows
the last line when it reads it, but when it is looking at the
next-to-last line it doesn't know that there is only more one line
to come. It does, however, know how many lines of input it has
already read.
So I have three suggestions: if you know that the header is X lines
long [let's say 5 for this example] and that the first line of the
footer contains some string or pattern that will not occur in the
significant part of the post,
:0 fbwi
* conditions
| sed -ne 1,5d -e '/pattern/q' -e p |
If you recognize the end by the last line that you want to keep
instead of the first line that you want to delete, omit the n
option and the p instruction:
| sed -e 1,5d -e '/pattern/q' |
Finally, if the only reliable way to spot the footer is by reaching
so many lines from the end (because any search pattern might occur
in the real text as well), we can score as you've been doing to get
the number of the last significant line. Let's say the footer is
three lines long; because ^.*$ always counts one line too many
(long story), we subtract four instead of three:
:0 fbwi
* conditions
* 1^1 B ?? ^.*$
* -4^0
| sed -e 1,5d -e "$="q |
14.14 Kill: simple kill file recipe with procmail
Kill files are widely used with news readers to delete uninteresting
posts when you enter a newsgroup. A kill file usually contains one
single entry per line to match the message content and this can be
easily done with procmail. Remember however that for every message
procmail forks a process, so before you apply the kill file rules to
the messages, be sure your recipes are in this order: the kill file
rules are applied only to unknown messages
SINK MAILING-LISTS
SINK ANNOUNCEMENTS
SINK WORK MESSAGES
OTHER DELIVERIES
apply kill file rules and UBE recipes to the rest |
Recipe will drop the message (i.e. consider it 'delivered') if one
of its headers matches a pattern in kill file.
:0 hW: $HOME/.kill file$LOCKEXT
| egrep -i -f $HOME/.kill file |
The reason why there is explicit lock file is that you must be able to
update the kill file while your procmail is running. An example edit
script is presented below.
#!/bin/sh
# program: kill file.sh
#
file=$HOME/.kill file
lock=$file.lock
cp $file $file.tmp
emacs -q $file # or use whatever you prefer: vi, pico
lockfile $lock
mv $file.tmp $file
rm -f $lock |
14.15 Kill: duplicate messages
[Lars Kellogg-Stedman <lars A T bu.edu>] Put this as a first entry in
your .procmailrc and you won't see any duplicates as long as the 8K
cache doesn't get full. The duplicates folder is cleaned out
weekly via a cron job. While it may be tempting to simply sink
duplicates to /dev/null, I have come across broken mail clients the
stick the same value in the Message-id header of all outgoing
mail.
:0
* ^Subject:\/.*
{
SUBJECT = $MATCH
}
MID_CACHE_LEN = 8192
MID_CACHE_FILE = $PMSRC/msgid.cache
MID_CACHE_LOCK = $PMSRC/msgid.cache$LOCKEXT
LOCKFILE = $MID_CACHE_LOCK
# IF the message has a message-id header
# AND formail -D is successful (exit status=0)
# THEN
# log a message to the procmail log
# sink the message
:0
* ^Message-Id:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE
{
LOG="dupecheck: discarded message, $SUBJECT $NL"
:0 # Store duplicates, notice no lock!
duplicate.mbox
}
LOCKFILE # Release lock by killing variable |
And here is a bit simpler recipe, a slightly modified version from
the [manual]. Procmail notices formail's success, considers the
message delivered and does not stop processing the rcfile due to
c flag, which let's a message to fall into safety copy inbox.
:0 hWc: $PMSRC/pm-msgid.cache$LOCKEXT
* ^Message-id:
| $FORMAIL -D 8192 $PMSRC/pm-msgid.cache
:0 a:
duplicate.mbox |
There was a pretty heavy thread around September 1997 about
duplicate detection, where some promising stuff was posted.
One item you should definitely have in your collection is
Eli's hashd [in Procmail mailing list 1997-09]
14.16 Kill: spam filter with simple recipes
[Ed McGuire <emcguire A T i2.com>] Seeing several junk mail filters
posted recently, varying from the simple to the complex, I thought
I would also share my own. I junk whatever comes from my ISP but is
not addressed to my domain or to one of the mailing lists I
subscribe to.
# 1. mail to my domain
# 2. NOT addressed to me directly
# 3. NOT coming from mailing lists I'm subscribed to.
0:
* ^(received):.*psi\.com
* ! ^((apparently-)?to|cc):.*(i2|intellection)\.com
* ! ^(to|cc):.*(pdp-?8-lovers|procmail|sunshine|info-pdp11)
junk.ube.mbox |
[Gordon Matzigkeit <gord A T m-tech.ab.ca>] I have just
discovered an effective rule for separating SPAM from the rest
of my e-mail. Just substitute your username for gord in the
line below
# Anything which is not addressed to me is probably SPAM.
:0:
* !^TO().*\<gord\>
junk.ube.mbox |
This only works because I handle all mailing list addresses above
that point in my .procmailrc (i.e. all traffic that arrives from
mailing lists that I am subscribed to goes into other folders).
Most SPAMmers seem to do it nowadays by sending mail via mailing
lists, rather than creating huge To lists of users
Many times sysadm install a list of know addresses that
send spam and then they check the incoming mail against the "black
list". Keep in mind that that some fgrep implementations have a
problem with the -w word switch. Note that the above recipe scans
the FULL HEADER, so use it with some caution, i.e., be careful what
you add to your list of spam domains.
# by [philip]; egrep would do here too, if it is posix
# compliant, it may have -f switch that makes it behave
# like fgrep.
#
# Note: option -F would make [ef]grep to search fixed string
# instead of regexps.
#
BLOCK_FILE = $HOME/Mail/DeniedNames.lst
UBE_MBOX = $HOME/Mail/junk-ube.mbox
# To filter out the Subject lines, so that mails sent
# with the subject "Have you received a message from
# blah-blah@spam" don't get filtered.
# [era] suggested we use formail
#
# Edsel Adap <edsel.adap A T Canada.Sun.COM> agrees there is a
# likely bug in Solaris 2.5.1 "/usr/bin/fgrep -i" and
# suggested the use of /usr/xpg4/bin/fgrep instead.
#
# <edsel.adap A T canada.sun.com> Sun Microsystems Developer Support
# Files in /usr/xpg4 are available via the SUNWxcu4 package,
# which is part of the user, developer, all, or Xall Solaris
# clusters.
#
# Solaris 2.4 doesn't have /usr/xpg4/bin/fgrep :-(, you
# must use `tr A-Z a-z' before piping the message to fgrep.
:0 hw:
*$ ? $FORMAIL -ISubject: |fgrep -i -f $BLOCK_FILE
$UBE_MBOX |
The file DeniedNames.lst is simply a list of addresses
82338201@compuserve.example.com
Dwnliner@ix.netcom.example.com
Emerald@earthstar.example.com
FreeWay@dm1.example.com |
14.17 Kill: (un)subscribe messages
I'm getting tired of those pesky (un)subscribe messages that
certain "other" mailing lists seem to pass through to the list at
large instead of capturing them at the list server, like SmartList
does.
[Adam Shostack <adam A T bwh.harvard.edu>] The following do help,
although they're often too broad. (I use a .safe rule to cover those
cases) The < 1000 is a useful hueristic. It's rare that unsubscribe
messages are long.
:0 :
* (Delete|u*n*Sub(s| )*| add | leave | help )
* < 1000
junk.misc.mbox |
[Rodger Anderson <rodger A T hpbs2245.boi.hp.com>] I've been
working on a recipe to filter out those pesky s*bscribe and
uns*bscribe messages from mailing lists, and I'm posting what I have
so far. As an aside, it also filters out very short messages, which
I've found are usually some sort message meant for list owner/request
address.
I give heavy weight to Subjects starting with (un)?s*bscribe, with
also pretty heavy weight to Subjects containing either of those
words. I then give heavy weight to the body of messages starting
with those words, and a lighter weight to lines starting with them.
Then multiple occurrences get some weight too, up to a point. Then I
count the words in the message against all that.
:0
* 1^0
* 30^0 H ?? ^Subject: +(un)?subscribe\>
* 20^0 H ?? ^Subject:.*\<(un)?subscribe\>
*$ 20^0 B ?? ^^$SPCNL*(un)?subscribe\>
*$ 10^0 B ?? ^$SPC*(un)?subscribe\>
* 8^.4 B ?? \\<(un)?subscribe\>
* -.4^1 B ?? \\<$a+\>
junk.misc.mbox |
[Adam Shostack <adam A T bwh.harvard.edu>] How about looking for sub &
unsub, as well as a perennial misspelling 'unsuscribe me'? I also
find filtering on add, leave and help to be useful. This may well be
the only word on the line. I think it has to do with broken list
management packages.
| :0
| * 1^0
| * 30^0 H ?? ^Subject: +(un)?subscribe\>
* 20^0 H ?? ^Subject: +(un)?sub?(scribe)?\>
(The B is often missing, as is the word fragment 'scribe')
| * 20^0 H ?? ^Subject:.*\<(un)?subscribe\>
* 20^0 H ?? ^Subject: +(add|leave|help)$
# fewer points if more words
* 15^0 H ?? ^Subject: +(add|leave|help) |
[david 1998-10-20] You want to match on messages where the
first non-blank thing in the body is "unsubscribe" at the end of a
line, where there are five lines or fewer in the body?
:0
*$ B ?? ^^$SPCNL*unsubscribe$
* 7^0
* B ?? -1^1 ^.*$
junk.misc.mbox |
^.*$ always counts one line too many, so a five-line body will be
counted as six; that's why we need a prejudice of 7. But if the
first non-blank text in the body is "unsubscribe" alone on a line,
is a line count really necessary? True posts that include the word
will have it in the middle of a sentence, such as the preceding
one. What you'll find by specifying a line limit is that
unsubscribe requests with long signatures or attachments at the
bottom of a previous message will get through.
14.18 Time: Once a day cron-like job
[Bill Moseley <moseley A T netcom.com>] If you want to do
something only once a day, they you have to store the date
somewhere and check against that stored date.
YYMMDD_FILE = $HOME/.yymmdd
YYMMDD = $YY-$MM-$DD
# Contains single line of procmail code
# YYMMDD_PREV = ..
INCLUDERC = $YYMMDD_FILE
# If different date, then enter this block
# The echo updates stamp in file.
:0
*$ ! YYMMDD ?? ^^$YYMMDD_PREV^^
* ? echo "YYMMDD_PREV = $YYMMDD" > $YYMMDD_FILE
{
...do the cron jobs..
} |
14.19 Time: Running a recipe at a given time
If I put a program to my recipes, it will be executed every time
message arrives. That's a problem, and I'm not allowed to use cron
in this account. I'm looking for some sort of condition to check
the current time and if its outside of the hours 11pm and 7am then
execute the action.
[david] How do your From_ lines look? If they're the traditional
kind that sendmail and smail add, they include the local time on your
system at receipt. So include a check that the hour is between 07
and 22 inclusive, like this:
:0 c
* ^From .*some-address.* (0[789]|1.|2[012]):[0-5][0-9]:
| command |
I included the minutes and the colon that separates the minutes from
the seconds so that the expression for testing the 07-22 range can
match only on the hour.
14.20 Time: Triggering mail and using cron
[david] Put something like the following entries in your personal
crontab for your userid (and not knowing if you particular cron
"cd's" to your home directory first):
0 23 * * * touch $HOME/.mail.relay.on
0 7 * * * rm -f $HOME/.mail.relay.on |
And if your cron doesn't know the HOME variable (that'd be an
exception)
0 23 * * * /bin/csh -c 'touch ~LOGNAME/.mail.relay.on'
0 7 * * * /bin/csh -c 'rm -f ~LOGNAME/.mail.relay.on' |
Then, in your .procmailrc do:
:0 c
* ^From.*some-address
*$ $IS_FILE $HOME/.mail.relay.on
| command |
the script will run_my_program only if both the subject matches and
the file test succeeds. The file test will succeed only between 11pm
and 7am.
In all honesty, if system gives usable From_ lines, I like following
suggestion better. I use it all the time to turn blocks of procmail
code on and off at given times or dates, and it works likes a charm.
It uses many fewer processes and is less likely to get the status
wrong if for any reason one of the cron jobs fails to run or doesn't
do its job.
This pages only at day time
:0 c
* ^From .*some-address.* (0[789]|1.|2[012]):[0-5][0-9]:
| command |
This pages at night
:0 c
* ^From .*some-address.* (0[0-6]|23):[0-5][0-9]:
| command |
14.21 Decoding: Uudecode
[philip] here is piece of code to do uudecode match when certain
condition is matched. The magic string here is "begin ...file",
the body is then fed to my_uudecode_program whatever it does
to it.
:0 b
* ^From:.*someone@example\.com
* ^Subject: Subject
* B ?? ^begin 644 file.tar.gz
| my_uudecode_program |
14.22 Decoding: MIME
# by Peter Galbraith <galbraith A T mixing.qc.dfo.ca>
# MIME filtering of accented characters and split lines.
#
:0
* ^Content-Type: *text/plain
{
:0 fbw
* ^Content-Transfer-Encoding: *quoted-printable
| mimencode -u -q
:0 A fhw
| $FORMAIL -I "Content-Transfer-Encoding: 8bit"
:0 fbw
* ^Content-Transfer-Encoding: *base64
| mimencode -u -b
:0 A fhw
| $FORMAIL -I "Content-Transfer-Encoding: 8bit"
}
# 1995-10-18 Tim Pickett <tbp A T cs.monash.edu.au>
#
# Decode MIME quoted-printable Content-Transfer-Encoding
#
# Conditions
#
# Mail has a MIME-Version header with a number in it.
# Header saying "Content-Transfer-Encoding: quoted-printable"
# exists
:0
*$ ^MIME-Version:$s*$d*(\.$d*)
*$ ^Content-Transfer-Encoding:$s*quoted-printable
{
:0 fhw # Remove header
| $FORMAIL -I"Content-Transfer-Encoding:"
:0 fbw # Decode the body.
| mmencode -u -q
} |
14.23 How to send commands in the message's body
:0 b
* ^Subject: ARCHIVE
| sed -e '/$s*[^a-zA-Z]/,$ d' | sh |
14.24 Matching two words on a line, but not one
How does one write a recipe that will do this: Put mail in mailbox
which has a line with two string (one and two) like:
but save mail in error-folder if the line as only the first
string like: one (string two is missing)
[philip] I presume these lines would be located in the body of the
message, and that by "space between one and two" you mean
"whitespace between one and two". If those assumptions are wrong
then you'll need to tweak the following recipes:
# The 'B' tells procmail to look in the body instead of the header.
# The second colon tells procmail to lock the mailbox with a
# local lock file -- if mailbox is a directory then you don't need
# it. The brackets in the condition contain a space and a tab.
:0 :
*$ B ?? one$s*two
default.mbox
:0 :
* B ?? one
error.mbox |
Now, the above will match even if "one" or "two" is part of another
word (at the end in the case of "one" and at the beginning in the
case of "two"). If you don't want that then you'll need to change
the recipes to read:
:0 :
*$ B ?? ()\<one$s*two\>
default.mbox
:0 :
* B ?? ()\<one\>
error.mbox |
14.25 How to define personal XX macros?
By macro, I'm referring to the procmail's FROM_DAEMON, TO and TO_
that you can use in matches. Here is one way to make one's own macro
[alan] Define HEADERS to include those headers you care about. Pick
one of the definitions below (and remove or comment out the
others). Here are three ways to define user to_ macro
- use only To:
- use either To: or Cc:
- To:, Cc:, or Apparently-To:
to_ = '^To:(.*\<)?'
to_ = '^(To|Cc):(.*\<)?'
to_ = '^((Apparently-)?To|Cc):(.*\<)?' |
And you use it like this
:0 :
*$ $to_()foo@example.com
address-matched.mbx |
[jari] and here are some more examples
cc_ = "(^((Original-)?(Resent-)?(Cc|Bcc)):(.*[^a-zA-Z])?)"
from_ = "(^(Apparently-|Resent-)*\
(From|Reply-To|Sender):(.*\<)?|\
^From $NSPC+)"} |
14.26 How to change subject by body match
Suppose you to change the mail's subject when there is a match in
the body. The desired outcome would be this:
From: foo@this.is
Subject: Fault: NNNN in program block YYY << changed
Fault: NNNN in program block YYY |
Here is the answer
:0 fhw
* ^Subject: NOK case report
*$ B ?? ^$s*\/Fault: [0-9a-f]+ in program block.*
| $FORMAIL -I "Subject: $MATCH" |
14.27 How to change Subject according to some other header
Suppose you want to change the subject when mail comes to some
particular address; or when some other header field. Here is one
way to do it, we suppose that mail comes to various internal mail
addresses. See the HEADERS macro in previous section.
# By [alan]
# Examine headers, create a subject tag if we recognize a list
TAG = ""
:0
*$ ${HEADERS}info@example.com
{
TAG = "info"
}
:0 E
*$ ${HEADERS}check@example.com
{
TAG = "check"
}
# ...and so on...
# now, if TAG is set, insert it into the subject
MATCH # kill this
:0 fhw
* ! TAG ?? ^^^^
* ^Subject: *\/[^ ].*
| $FORMAIL -I "Subject: $TAG - ${MATCH:-<no subject>}" |
Or you could use the command line arguments, add following
line to your .forward. (alias file syntax)
foo: "|/usr/local/bin/procmail -m /usr/local/etc/pm-tagit.rc foo" |
Then in tagit.rc you would instead say:
ARG = $1
:0
* ARG ?? ^^foo^^
{
TAG = "foo@go"
}
:0
* ARG ?? ^^somethingelse^^
{
TAG = "somethingelse@go"
} |
This method will work even if someone Bcc:s a message to
foo@example.com.
14.28 How to call program with parameters
...now, suppose I want to call program with parameter $FOUND,
and get the result back in RESULT, how do I do it ?
The stdout of myprogram will be captured at stored in the variable
RESULT. Also consider what should happen if there are spaces or tabs
in the value of $FOUND. Perhaps it should be better off enclosed with
quoted.
# Make sure FOUND is not empty before passed to program
:0
* ! FOUND ?? ^^^^
{
RESULT = `program "$FOUND"`
} |
15.0 Miscellaneous recipes
15.1 Matching valid Message-Id header
[philip] wrote full RFC compliant matcher. Follow the link
<http://www.xray.mpe.mpg.de/mailing-lists/procmail/1998-03/msg00375.html>
dq = '"' # (literal) double-quote
bw = "\\" # (literal) backwhack
ws = "[ ]*" # whitespace
atom = "[-!#-'*+/-9=?A-Z^-~]+"
word = "($atom|$dq([^$dq\]|$bw.)*$dq)'
local_part = "$word($ws\.$ws$word)*"
domain = "(\[$ws([^][\]|$bw.)*$ws\]|$atom($ws\.$ws$atom)*)"
:0
* ! $ ^Message-Id:$ws<em>$ws$local_part$ws@$ws$domain$ws</em>
thats-non-valid-message-id |
15.2 Sending two files in a message
If you plan to send multiple files in a message, be sure that every
file has extra blank line at the end so that they can be catd
together. Instead of doing
(cat THIS; echo " "; cat THAT ) | $SENDMAIL |
You do
(cat THIS THAT ) | $SENDMAIL |
But sometimes you don't have control over the files, then you can
do this to make sure there is blank line. Notice, only two
processes used compared to first choice.
(echo '' | cat THIS - THAT ) | $SENDMAIL |
[David] And an sed expert would do it this way
(sed -e '$ !b' -e '/./G' -e "r THIS" THAT ) | $SENDMAIL |
- $: the last line
- !: everywhere except the range (in this case, everywhere except
the last line)
- b: branch to a label. No label: branch to the end
(and, since -n is not in effect, print the pattern space)
Now remember that everywhere except the last line, we've
skipped ahead, so the rest of the code will be executed only for
the last line of the input.
- /./: on lines that contain a character (but we get here only for
the last line, so on the last line if it contains a character)
- G: append a newline and the contents of the hold space to the
pattern space (the hold space is empty, so basically, if the
last line was already empty, do nothing, but if the last line
was not empty, append a newline and thus add a blank line after
it).
- r file: After finishing with this run through the sed
instructions, read the named file and copy it to the output.
This side of sed comes out only after sed has had a few drinks...
15.3 Excessive quoting of message
[25 Nov 1997 buck A T Compact.COM] I administer a LISTSERV mailing
list and our host has asked us to reduce excess quoting of
previously posted material. ...Subject: asking if this was
excessive quoting. With the weights below, this extra copy will
activate at 66% quoted lines of all body lines.
[era] I would definitely tolerate 75% quotes. And in the
end, you will of course always have to face the kinds of people who
would rather change their quoting style to evade such constraints
than quote less. An idealized quote parser should perhaps realize
that a non-blank prefix that recurs on a lot of lines is probably a
customized quote string.
This will preserve the correspondent's original subject (with a Re:
added if it didn't already have one) and thus the template text
should indicate the nature of the problem.
I'm not sure what would be appropriate to generate behavior more
like I suggest below, any takers? Perhaps no score at all for empty
lines, neutralize .signatures (hope sender obeys "-- " convention)
and add 10^0.5 for each quoted line and dish out -15^0.3 for
non-quoted? (I haven't really explored this – could be completely
up the creek.) [Also, perhaps long runs of quoted material should
be penalized harder than quoted snippet – reply text – quoted
snippet – reply text alternations?]
COPY_ADDRESS = "listAdm@example.com"
:0
* ^Sender: <mailing list tag>
{
# - quoted lines
# - non-blank, non-quoted lines
# - completely blank lines
:0
*$ 10^1 B ?? ^$s*>
*$ -15^1 B ?? ^$s*[^>$WSPC]
*$ -15^1 B ?? ^$s*$
{
# You don't need to repeat the original condition here
# You also don't really need to extract SENDER
# Generate a reply with appropriate headers and the
# body quoted
:0 fhw
| $FORMAIL -rtk -A "Bcc: $COPY_ADDRESS"
# Now "replace" the body with template text + body (In
# other words, add the template before the quoted body)
:0 fbw
| cat $HOME/template.txt -
# Now send it off to recipients mentioned in generated
# header
! -t
}
# Wasn't excessively quoted; save it
:0 :
$SOME_MBOX |
15.4 Sending message to pager in chunks
I have a 200 character limit on my pager. But I have wordy contacts
who go over that limit. What I would like to do is have a recipe
split up messages addressed to my pager into 200 character (max)
messages (Procmail mailing list 1997-12).
[era] This stuff about forwarding to pagers is a recurring
topic on this list. I've tried to find a good summary of all the
issues but there always seems to be some tiny twist to what people
would like to have implemented. As a general comment for future
generations, the Procmail part is usually trivial and the problem
reduces to writing a good program (shell script or otherwise) for
formatting the text precisely the way you want it, and spitting it out
in suitable chunks.
Here's something to split up the body of the message into smaller
chunks and do a shell script on each chunk. The -s option to fold says
to only wrap lines on whitespace if possible
# Create a duplicate of the message to forward to the pager.
# This will be reformatted and have most headers stripped off.
:0 c
{
# Construct header with only From: and Subject: retained
HEADER = `$FORMAIL -XFrom: -XSubject:`
# Reformat body as 200-character lines and send each
# as a separate message with the preconstructed minimal
# header
:0 bw
| tr '\012' ' ' | fold -s -w 200 | while read line; do
echo -e "$HEADER\n\n$line" | \
$SENDMAIL pageraddress@wherever.com ; done
} |
If your version of echo doesn't understand \n to mean newline
(and/or the -e option to enable this escape processing), you need to
tweak this. (You might need to anyway – this is mostly untested. In
my limited testing, I found the messages would arrive in more or less
random order. Inserting pauses in the script should help to some
extent, but could lead to other problems and is not an ideal solution
anyhow.)
I don't know if the header trimming is required; some pager gateways
appear to count the headers as part of the message, while others
don't. Again, for future generations, details like this are relevant
to include when you ask about how to do this.
15.5 Playing particular sound when message arrives
[Peter S Galbraith <galbraith A T mixing.qc.dfo.ca>] Here is
the command in shell to produce the sound:
% cat anyfile | /usr/X11R6/bin/auplay /usr/lib/exmh/drip.au |
However, it won't work directly in the recipe
procmail: Executing "/usr/X11R6/bin/auplay /usr/lib/exmh/drip.au"
Can't connect to audio server |
Strange. The command works from the shell if I su to user mail.
Anyway, I got it to work by fully specifying the audio server (which
is my workstation, where I receive mail)
AU = /usr/X11R6/bin/auplay
TUNE = /usr/lib/exmh/drip.au
:0 hwic
* ^From:.*foo@example.com
| cat > /dev/null; $AU -audio tcp/mixing:8000 $TUNE |
15.6 Combining multiple Original-Cc and Original-To headers
How can I use procmail/formail to combine the information in these
headers into their CORRESPONDING header MINUS the Original-*
Note that I can have multiple Original-Cc: headers and I want all
the recipients combined into one Cc: header.
# 1998-01 by [david]
# initialize as unset
ORIG_TO ORIG_CC
# The -c option to formail takes care of headers continued onto
# indented lines; the pipe to tr takes care of multiple
# Original-To: headers by linking their contents with commas.
:0
* ^Original-To:.*[^ ]
{
ORIG_TO = `$FORMAIL -zcxOriginal-To: | tr \\12 ,`
}
# Drop trailing comma from tr:
:0 A
* ORIG_TO ?? ,^^
* ORIG_TO ?? ^^\/.*[^,]
{
ORIG_TO = $MATCH
}
# Likewise for Original-Cc: lines:
:0
* ^Original-Cc:.*[^ ]
{
ORIG_CC = `$FORMAIL -zcxOriginal-Cc: | tr \\12 ,`
}
:0 A
* ORIG_CC ?? ,^^
* ORIG_CC ?? ^^\/.*[^,]
{
ORIG_CC = $MATCH
}
# Now, let's install the changes if needed:
# with -A instead of -I or -i it should
# not clobber existing To: or Cc: information.
# -A : Append a custom header field onto the header in any case.
:0
* ORIG_TO ?? ^^^^
* ORIG_CC ?? ^^^^
{ }
:0 E fhw
| $FORMAIL \
${ORIG_TO:+-A "To: $ORIG_TO"} \
${ORIG_CC:+-A "Cc: $ORIG_CC"} |
15.7 Forwarding sensitive messages in encrypted format
[<Valdis.Kletnieks A T vt.edu>] Please note that the standard Unix
crypt(1) command is not secure, as it uses a modification of the
Enigma engine, which was broken by the Benchley Park guys (Turing
and the rest) back during WWII, using a mechanical relay based
computer. As such, it is trivially easy to break using any computer
more resent than a Radio Shack TRS-80. Poke around in any of the
comp.sources.Unix archives, they had a "Crypt Breaker's Workbench"
posted well over a decade ago. For similar reasons, I can't
recommend single-pass 56-bit DES anymore either. Triple-DES (with
an effective 112-bit key) looks safe, as do any of the encryptions
provided with PGP.
# by [alan]
# See if addressed *directly* to me, and ..
# ..has not already been forwarded
KEY = "TheMagic"
FORWARD_EMAIL = "foo@example.com"
:0
*$ ^To:.*$LOGNAME(@|[^0-9a-z]|$)
*$ ! ^$MYXLOOP
{
# now let's encrypt the body using mimencode
:0 fbw
| echo "MIME-Version: 1.0" ; \
echo "Content-Type: application/crypt" ; \
echo "Content-transfer-encoding: base64" ; \
echo "" ; \
crypt $KEY | mimencode -b
# Now let's prepare the headers for forwarding the mail,
# and mark it so we don't loop
:0 fhw
| $FORMAIL -I"Resent-To: $FORWARD_EMAIL" -I"$MYXLOOP"
:0
! $FORWARD_EMAIL
} |
16.0 Procmail and PGP
16.1 Decrypt pgp messages automatically
Warning: if you use remailers or anonymous services, you must use
different passwords and different user id's to decrypt incoming
messages. If you just receive messages encrypted with one key, then
you this may be useful to you. However, it is generally considered a
huge security risk to keep your password carved into your .procmailrc.
:0 fbw
* B ?? PGP ENCRYPTED MESSAGE
| pgp -z "your pass phrase" -f +batch 2>&1 |
16.2 Getkeys from key server
# by Adam Shostack <adam A T bwh.harvard.edu> 1996-02
#
# This first ruleset protects me from mailbombs from an automated
# service that I often send incorrect commands to, generating 5mb
# of reply. It also sorts based on success of the command.
#
# swissnet.ai.mit.edu is fast key server
:0
* From bal@swissnet.ai.mit.edu
{
:0 h
* >10000
/dev/null
:0 h
*^Subject:.*no keys match
/dev/null
:0 E
| pgp +batchmode -fka
} |
16.3 Auto grab incoming pgp keys
# [Opher Kahn <kahn A T dg-rtp.dg.com>] This first
# ruleset protects me from mailbombs from an automated
# service that I often send incorrect commands to,
# generating 5mb of reply. It also sorts based on success
# of the command.
#
# swissnet.ai.mit.edu is PGP key server
:0
* From bal@swissnet.ai.mit.edu
{
:0 h
* >10000
/dev/null
:0 h
*^Subject:.*no keys match
/dev/null
:0 E
| pgp +batchmode -fka
}
# auto key retrieval
#
# I have an elm alias, pgp, points to a key server The log file
# gets unset briefly to keep the elm lines out of my log file.
:0 W
* B ?? -----BEGIN PGP
* H ! ?? ^FROM_DAEMON
{
KEYID = `/usr3/adam/bin/sender_unknown`
}
LOGFILE=
# #todo: We should get rid of the 'elm' dependency here.
# #todo: correct this sometime... [jari]
#
:0 ahc
* ! ^X-Loop: Adams autokey retrieval
| $FORMAIL -a"X-Loop: Adams autokey retrieval" | elm -s"mget $KEYID" pgp
#!/bin/sh
#
# Script: sender_unknown
#
# unknown returns a keyid, exits 1 if the key is known. $output
# is to get the exit status. Otherwise, this would be a one
# liner.
OUTPUT=`pgp -f +VERBOSE=0 +batchmode -o /dev/null`
echo $OUTPUT | egrep -s 'not found in file'
EV=$?
if [ $EV -eq 0 ]; then
echo $OUTPUT | awk '{print $6}'
fi
exit $EV
# end of sender_unknown |
17.0 Includerc usage
17.1 Using: multiple rc files
...Do INCLUDERC statements function as a kind of "call" which
returns control to the "original" rc file if processing falls off
the end of the included rc file? Or if processing falls off the
end, does mail then get delivered to $DEFAULT and processing
stop? Suppose I have these commands
INCLUDERC = $PMSRC/pm-a.rc
INCLUDERC = $PMSRC/pm-b.rc
INCLUDERC = $PMSRC/pm-c.rc |
Yes, the control is returned to the original file where the
includerc was called from. And No, mail does not get delivered in
the $DEFAULT because the includerc just ends: processing continues
until there is no more statements in the top level.
Includerc is nothing more that a sliced top level recipe.
17.2 Using: call rc file conditionally
One interesting way to prevent false hits when filtering UBE is to
try to see if the message comes from some valid destination
first. If it comes, then it shouldn't be run through UBE filter,
because it may filter valid messages out. No ube filter is
completely bullet proof.
Here is an example where the UBE detection is put into use only when
the message comes from somewhere that I don't know beforehand (or I
have just forgot to tweak my .procmailrc)
ME = "(me@here.example.com)"
LISTS = "(procmail|list-a|list-b)"
:0 # Idea by Bill Moseley
*$ ! ^TO_()$ME
*$ ! $LISTS
{
# Could be UBE or I might be on a unknown distribution list.
INCLUDERC = $PMSRC/pm-ubecheck.rc
} |
[dan] That would work; common practice, however, is to put recipes
for filing mail from lists (and, per Bill's preferences, anything
mentioning procmail in the head gets treated the same as mail from
this list) first; then the only remaining condition to consider
there would be unexpected blind carbons: * ! ^TO_moseley. This
method is good if you get much more spam than legitimate mail
(including mail from list subscriptions as legitimate) and you want
procmail to deal with spam right away. I belong to several very
active mailing lists, so I actually receive more pieces of
legitimate mail than pieces of spam.
One way to get the best of both worlds is this:
*$ ! ()\/(^TO_$LOGNAME|procmail|list-(ABC|123|XYZ)) |
because then, if the regexp matches (and thus the negated condition
fails and you don't detour into $PMSRC/checkspam.rc), MATCH is
already set to the name of the mailing list, and you can do further
tests by just examining MATCH (or a variable you copy it into)
instead of a repeating a complete head search. [I prefer to use
the variable $LOGNAME rather than hard-coding my name because then
others can use the code, and I can use it unchanged on sites where
my logname is different, and if my logname is changed my
procmailrc will keep up with it.] For example (I've separated the
conditions into two lines so that, per Bill's preferences, a
mention of procmail in the head will get the message into the
Procmail List folder, even if a match to $^TO_$LOGNAME is also
present and appears sooner):
:0
* ! ()\/(procmail|list-(ABC|123|XYZ))
*$ ! ^TO_$LOGNAME
{
INCLUDERC=$PMSRC/pm-ubecheck.rc
}
# The next recipe has an `E' flag, so it will be examined
# only if the preceding one didn't match; thus if $MATCH was
# set inside pm-ubecheck.rc, it won't hurt anything here, and a
# value for $MATCH set in pm-ubecheck.rc
# won't be mistaken for a list name:
:0 E: # MATCH is non-null only if it matched a list name
* MATCH ?? (.)
$MATCH
# Remaining recipes will be read only for two types of mail:
# those that met $^TO_$LOGNAME but not any expected list
# name, and those that went through pm-ubecheck.rc but came out
# undelivered. |
17.3 Using: autoloading an rc file
Now when you know that includerc can be called conditionally, let's
discuss about "autoloading of a module". For example you may see
following statement modules which import predefined variables:
:0
* ! WSPC ?? ( )
{
INCLUDERC = $PMSRC/pm-javar.rc
} |
It says that "If variable WSPC does not contain space, then load
module". If the module has already been loaded by some other rc
file, the WSPC would exist. If it does not exist yet, then the
module is loaded. This is classical example of conditionally
loading functions or variables into current module:
Check if feature is present, No? Then load module module. |
Justin Lloyd <jlloyd A T harris.com> suggest a general way of
caching the included rc files. Use top-level script that
records every module that was included. The module is loaded
only if it it not yet included:
# pm-xximport.rc
:0
* ! INCLUDE_CACHE ?? ()\<$RC\>
{
# Module was not there yet, add it to the list
INCLUDE_CACHE = "$INCLUDE_CACHE$RCFILE$NL"
INCLUDERC = $RC
} |
This is different approach then the previous one. Instead of
checking features, the presense of module is checked. Two sides of
the coin which can be used for the same thing. You can pick either
solution but here are some thoughts:
- Adding extra top level INCLUDE_CACHE is extra work. Procmail
must open a separate top-level rc file every time with call
RC="pm-xxscript.rc" INCLUDERC=pm-xximport.rc |
- If feature already existed, you would still have to open the
pm-xximport.rc file for every call to find it out. E. g. here you
pm-xximport.rc is called 3 times no matter if 1, 2, 3 were
already present
RC="pm-xxscript1.rc" INCLUDERC=pm-xximport.rc
RC="pm-xxscript2.rc" INCLUDERC=pm-xximport.rc
RC="pm-xxscript3.rc" INCLUDERC=pm-xximport.rc |
With previous simple feature test, procmail can evaluate the
condition in place without the need of opening separate file:
if no feature present..
then load
if no feature present..
then load |
Note however, that both suggestions accomplish the same thing; the
implementation is only different. If the typical count of including
RC files per module were big enough, I'd use justin's way. Usually
it's around few, say one or two, whose purpose is to define
variables of get date information.
17.4 Making: naming of the rc file
When you write an rc file, think whether or not it could be
generalized so that others could use it. You could adopt a style
where all procmail files start with prefix pm, so that they can
be stacked with other files in the same directory. If you simply
named them as rc.*, look what happens:
% ls rc* # fine, print rc files |
but If you would like to print all procmail relates files and backup
them with one command, the starting prefix is better:
% ls pm-*
--> pm-mytest.rc
pm-jaube.rc
pm-tips.txt
pm-art.txt
pm-incoming.log
pm-list.mbox # the mailing list |
A name foramt could be pm-xxSCRIPT-NAME.rc for a rc file where
xx is the initials of first name and surname, like (J)ohn (D)oe.
These scripts are product versions, that can be distributed. There
also is usually private scripts that handle other things, like
mailing lists, work messages and so on. They vould have a prefix
my.
pm-jdscript.rc
pm-myscript.rc << private version |
When downloading someone else's script it would be good if it's name
were unique according to person who made it:
pm-ajscript.rc # Average Joe's script. |
17.5 Making: Using name space when saving procmail variables
If you're going to write rc file that works like any other
programming language subroutine, you must separate it from the
world and make it well behaving. A subroutine is traditionally a
black box: you call it with arguments and it responds with
returned values. You don't need to know what happens in there. And
you expect that the subroutine hasn't changed the existing
environment, like procmail variables DEFAULT LOGFILE etc. when
it ends.
So the process diagram of a good RC subroutine is:
pm-xxscript1.rc
call --> +------------+
arguments | black | --> it may call
| box | other subroutines
| | <-- pm-xxscript2.rc
output values <-- +------------+ |
Procmail does not have local variables, so you must put the
variables to global name space. Let's see an example where
subroutine uses MAILDIR for chdir purposes.
MAILDIR_xxscript1 = $MAILDIR # save
...
MAILDIR = new location
...
...at the end of subroutine
MAILDIR = $MAILDIR_xxscript1 # restore |
Here the original value is saved when subroutine started and the
original value was restored when subroutine exited. The global
namespace (xxscript1) used was unique and is guaranteed not to
clash with anyone else's. If the pm-xxscript2.rc would have also
used MAILDIR the saved value would have been in
and the two wouldn't mix up with each others MAILDIR. The general
name for saved variable is therefore:
This follows the simple "onion" or "stack" model, where variable's
value is saved before changing it and restored on exit point.
save-x-1
set--x-1
save-x-2
set--x-2
..
restore-x-2
restore-x-1 |
17.6 Making: Public and private variables in rc file
As you learned above, the variables should be put to RC file's
name space. The user interface variables (public) should be all caps
and private variable should start with lowercase letter. Whether
you use "theVarStyle" or "the_var_style" is up to you.
[script pm-xxscript.rc]
# ........................... public
XX_SCRIPT_FLAG = ${XX_SCRIPT_FLAG:-"default"}
XX_SCRIPT_VAR = ${XX_SCRIPT_VAR:-"default"}
# ........................... private
charset = "a-z1-2"
regexp = "something-that-matches" |
Whether you need to stick prefix xx_script to the private variables
depends on whether you call another includerc which may happen to use
same names as you:
[pm-xxscript.rc]
charset = ... # watch this
...
INCLUDERC = .. # call another subroutine
charset = .. # holy cow, it used same variable
..back in the pm-script.rc
:0
* $charset # BOOM, not what you think. |
In this case it would be wise a) not to define charset at the top
of the file but to move the definition to just before the recipe
where it is used or b) make the name unique, with
xxScriptCharset.
17.7 The rules of thumb for constructing general purpose rc file
- Write good documentation at the beginning of file: how to set up
the includerc and explain what it does. If you don't include
docs, people may skip your extraordinary useful script. Also,
remember that the script lives in the Net and passes through many
hands long after you have been disconnected.
- Keep the layout like this: the user interface variables must all
be in capital letters. Familiarize yourself with what(1) tags
too. Notice the first and last lines: if you keep the format
like this, then any universal tool can rip your code from any
file (or mail), because it's delimited by "pm-xxScript.rc – "
and "end of pm-xxScript.rc". See Unix what(1) for first line's
syntax.
# pm-xxScript.rc -- procmail script for ...
# DOCS
USER VARIABLES
private variables
CODE
# end of pm-xxScript.rc |
- Always include version number or last modification date somewhere.
Prefer some version control tool, like RCS, VCS, ClearCase,
whatever you have at hand.
- Use a variable name like dummy in appropriate places to tell
what's happening in the code. Remember that the VERBOSE
setting isn't much help if you can't tell by looking at the
LOG where on earth the code is executing.
dummy = "start of pm-xxScript.rc"
...
dummy = "Now testing if we have control message XXX"
:0
* condition
{
dummy = "Now testing if the command is YYY"
:0
* condition
...
}
...
dummy = "end of pm-xxScript.rc" |
- If you need the value of some common headers, don't just call
formail like this because the value may already be available
prior your includerc. For example the user may already have needed
the Subject value and stored it in a variable
[in pm-xxScript.rc]
XX_SCRIPT_SUBJECT = `$FORMAIL -xSubject:'
[User may have already read the content to SUBJECT]
SUBJECT = `$FORMAIL -xSubject:'
INCLUDERC = $PMSRC/pm-xxScript.rc
Your pm-xxScript.rc launches an unnecessary formail call. Instead,
use the existing SUBJECT.
[user]
:0
* ^Subject:\/.*
{
SUBJECT = $MATCH
}
...
XX_SCRIPT_SUBJECT = $SUBJECT # Note this!
INCLUDERC = $PMSRC/pm-xxScript.rc
[ in the pm-xxScript.rc variable definitions ]
# User should initialize the variable
# XX_SCRIPT_SUBJECT if he already has read the
# subject.
:0
* XX_SCRIPT_SUBJECT ?? ^^^^
* ^subject:\/.*
{
SUBJECT = $MATCH
}
...the rest of the code |
- Add header X-Loop and test against it if you are sending an
automated reply. The X-loop prevents responding to already
responded message.
:0
* condition
* ! ^FROM_DAEMON
*$ ! ^$MYXLOOP
{
# Ok, now we're clear to send an automated reply
} |
17.8 An includerc skeleton
Here is my includerc file skeleton that i use in all my modules. The
funny looking ".$" are for the text2HTML Perl filter. The
documentation section can be ripped and turned into HTML very easily
is you just keep the standard 4 tab column positions and start the
description with "File id" and end it with "Change Log". The command
to make the HTML is:
% ripdoc.pl pm-xxscript.rc | t2HTML.pl > pm-xxscript.html |
These two perl files are available:
# pm-xxscript.rc -- one line description string here
#
# File id
#
# Copyright (C) 1997-98 Foo Bar
#
# This code is free software in terms of GNU GPL v2 or later
#
# Description
#
# This subroutine Parses <what> from variable INPUT
#
# Required settings
#
# PMSRC must point to source directory of procmail code.
# This subroutine will include
#
# o pm-xxScriptA.rc
# o pm-xxScriptB.rc
#
# Call arguments (variables to set before calling)
#
# o INPUT, the string from where to parse...
# o VAR1, description, default is ...
# o VAR2, description, default is ...
#
# Returned values
#
# ERROR will have value "yes" if couldn't parse INPUT
# OUTPUT will have result after successful parse
#
# Example usage
#
# :0
# * condition\/.*
# {
# INPUT = $MATCH
# INCLUDERC = $PMSRC/pm-xxscript.rc
# # OUTPUT has the result
# }
#
# Change Log: (none)
# ..................................................... &init ...
dummy = "init: pm-xxscript.rc start"
# Read the standard variable definitions if they are not
# yet defined: that's "if WSPC variable does not contains space,
# as it should, then global variables haven't been read yet"
:0
* ! WSPC ?? ( )
{
INCLUDERC = $PMSRC/pm-javar.rc
}
# .................................................... &input ...
# - User configurable variables with reasonable defaults
# - But parameters like "INPUT" that must be set beforehand
# are not mentioned here.
VAR1 = $VAR1{VAR1:-"default1"}
VAR2 = $VAR2{VAR2:-"default2"}
# .................................................... &do-it ...
dummy = "subroutine: pm-xxscript.rc parses now that and that"
<the code>
dummy = "subroutine: pm-xxscript.rc end."
# end of pm-xxscript.rc |
18.0 Mailing list server
|
Note:
These examples are for ad-hoc lists. Procmail language is not
suitable for handling complex mailing list administration
although there is Procmail based MLM called Smartlist. The de
facto MLM software with web based interface is nowadays Python
based GNU Mailman.
|
Simple Mailing list server
# by Lars Hecking <lhecking A T nmrc.ucc.ie>
#
MAJORDOM = "majordomo-(users|docs|workers)"
:0 w
*$ ^(Sender|To|Cc):.*\/$MAJORDOM
*$ MAJORDOM ?? ()\/$\MATCH
| $APPNMAIL $LISTS/$MATCH |
Here is another, by Brock Rozen <brozen A T torah.org> with ideas
from [dan]
# get the date in RFC822 format for insertion into some messages;
# the "Resent-Date:" field is copied from the "Date:" field on
# some systems. RFC1123 says "All mail software SHOULD use 4-digit
# years in dates..."
LIST_NAME = "myList"
LIST_ADDR = "$LSIT_NAME foo@example.com"
LIST_DATE = `date '+%a, %d %h %Y %H:%M:%S %Z'`
LIST_ERR = "$EMAIL" # my admin address
# Sendmail ignores "To:" in the presence of "Resent-To:"
#
:0 fhw
*$ !^X-List: $LIST_NAME
*$ ^TO()$LIST_NAME
| $FORMAIL
-A "X-List: $LIST_NAME" \
-I "Resent-To: $LIST_ADDR " \
-i "Resent-Date: $LIST_DATE" \
-I "Errors-To: $LIST_ERR" \
-A "Precedence: bulk" \
-A "X-Loop: $COMSAT"
:0 a
! -oi `cat /var/tmp/src/power-users.list` |
19.0 Common troubles
19.1 Procmail modes: normal, delivery, and mail filter.
... a) what recipes procmail goes through if there's no
/etc/procmailrc on the system b) how it decides whether an
address/local-part is valid or not c) how procmail selects the
mailbox to drop the mail
[philip] Delivery mode is invoked using the -d flag. All
arguments are the -d are user names. It is usually used by the MTA
to deliver mail to users, and indeed, procmail will return failure
if it is given an invalid user name. In delivery mode, procmail
reads /etc/procmailrc before the user's .procmailrc.
Note: Procmail will work in delivery mode only if it is
setuid root, if it is invoked with the ruid of the recipient named
in -d, or, under certain OSes where the build routines have
determined that it is safe, if the euid is that of the recipient
and the egid is the recipient's login group.
Mailfilter mode is invoked using the -m flag. It accepts
only one rcfile as an argument – other arguments are either
variable assignments or arguments that are made availible to the
rcfile itself as $1, $2, etc. If the specified rcfile is located
under /etc/procmailrcs/ then procmail will take on the uid of the
owner of that file. Otherwise, it will run as the user who invoked
it. /etc/procmailrc, that procmail -d reads, is ignored. In
mail filter mode, procmail unsets ORGMAIL and DEFAULT to
suppress normal delivery – reaching the end of the rcfile results
in the mail bouncing. If the rcfile sets either of them then
procmail will attempt delivery to that mailbox if it falls off the
end of the rcfile; however, the mailbox will have to be writable by
the uid/user that procmail is running as.
Note: Only one rcfile can be named on the command line, but names
of other rcfiles can be passed in the positional parameters to be
used later in INCLUDERC assignments.
Normal mode is invoked by not using the -m or -d flags. It
accepts any number of rcfiles and variable assignments as
arguments. Procmail runs as the invoking user in this mode.
/etc/procmailrc is ignored.
So, to answer your questions: if procmail reaches the end of the
specified rcfile, it bounces the mail (/etc/procmailrc is ignored).
Everything is up to the rcfile – how to determine whether the
address is valid and where to put the message if it is.
19.2 Procmail as sendmail Mlocal mail filtering device
...I'm a new sys admin at my company, and I've been trying to set up
Procmail as the mail filtering device (still using mail as the
Mlocal) I've tried setting up the sendmail.cf to use Procmail as
a filter (we want to use the current mailer as the local mailer)
with one local procmail rc file. Procmail seems to work just fine
if set up as the local mailer, but I'm still having problems
setting it as the filter.
[John M Vinopal <banshee A T abattoir.com> answers sendmail.cf]
R$+ < @ $=a . > $*
$#procmail $@ /etc/mail/procmailrc $: $1 < @ procmail > $3
R$+ <@ procmail > $* $1 < @ example.com .> $2 |
so this sends anything of the form foo@resort.com through procmail
and rewrites it as foo@procmail. the procmail script reinjects it
and it bypasses the call to procmail and then is rewritten back to
foo@example.com.
/etc/mail/procmailrc:
:0
! -oi -f "$@" |
19.3 Procmail doesn't pass 8bit characters
You've mistaken. Procmail does not do that to your mail.
Frank Gadegast <phade A T powerweb.de> tells you:
- procmail wasnt the problem, it was sendmail
- I uncommented this line in sendmail.cf
and now I get all nice German Umlauts.
# strip message body to 7 bits on input?
# O SevenBitInput |
The problem was that some mails run through the local mailer
procmail and arrived all right (local mail), all mail from external
(that dropped into my most used mailbox where I use a
procmail-filter), did not arrive all right. This made me think it
procmail, but these mails came from external and it was sendmail to
blame.
19.4 My ISP isn't very interested in installing procmail
...I recently requested my ISP to install procmail, and they
responded by saying no. Their main reason was they did not wish to
incur the traffic from any/ all of their subscribers setting up
mailing lists.
[Jon Lewis <jlewis A T inorganic5.chem.ufl.edu>] Wouldn't you need
write access to either /etc/aliases or /etc/procmailrc to setup
mailing lists? Tell the ISP that procmail will greatly improve mail
delivery and enable all users to filter out junkmail without ever
seeing it. If they still refuse, find a better ISP.
19.5 My ISP has systemwide procmailrc; is this a good idea?
[eli] I, for one, do not like my ISPs to put stuff in
/etc/procmailrc. There is precious little I will gain from that and
plenty of opportunity for them to make mistakes I would not have. At
one ISP I know people got upset at some sendmail level filtering of
mail. One of those upset is a habitual complain-to-spammer-ISP
person. He did not want problems seeming to go away if they were
really there. Another guy just didn't trust the filtering.
Writing a shell script that will give the user a .procmailrc which
includercs a system wide shared procmailrc is the best way to do it.
This forces the filtering to be "opt-in".
19.6 Procmail changes mailbox and directory permissions
By Ed McGuire <emcguire A T i2.com>. Before procmail was used:
> -rw-rw---- 1 foo mail 1127 Sep 11 07:33 foo |
After:
> -rw------- 1 foo mail 1517 Sep 11 07:34 foo |
when the UMASK environment variable is more restrictive than the
mode of the mailbox, procmail changes the mode of the mailbox. The
default value of UMASK is 077. If you want to preserve the group
access to your mailbox, I think you can set UMASK to 007 in the
rcfile:
Further note: the above UMASK suggestion in .procmailrc does not
work. See comment by Gjermund Sørseth <gjermund A T nextel.no>
However the permissions on DEFAULT are handled before procmail
even opens the .procmailrc, so changing the umask there will have
no effect on the mailspool.
[Scott J. Kramer <sjk A T lux.com>] it's documented in the
MISCELLANEOUS of the procmail(1) man page:
If /var/mail/$LOGNAME already is a valid mailbox, but has got too
loose permissions on it, procmail will correct this. To prevent
procmail from doing this make sure the u+x bit is set.
Otherwise, you might notice a syslog message like:
procmail: Enforcing stricter permissions on "/var/mail/sjk"
when it chmod's the file to 600. As you've discovered, this is
inconsistent with the SYSV (Solaris 2 anyway) default mailbox
protection of 660, gid=6 (mail). I think that's an OS-dependent bug,
with the `chmod u+x ...' as the workaround.
19.7 Changing mbox permission during compilation to 660
...it appears that mail that procmail delivers back into the
spool it is writing out with owner.group user.mail and rights
600. To me this is reasonable. Mail delivered to the spool by
/bin/mail is written out owner user, group mail 660.
When procmail delivers mail 600 later attempts at delivery with
procmail removed from the .forward file fail: /bin/mail doesn't
have permissions (or refuses to uses its permissions).
Since we have fickle and unruly users who will be moving their
forwards in and out of place this is a problem.
Is the correct solution to force procmail to write 660? If so, how
is this done? I assume in the section of config.h just below the
warning about only messing with a section if you think you know
what you are doing. I don't like feel like I know well enough what
I'm doing to walk into that territory without some guidance.
[alan] I used to be the manager of the system support in the
College of Engineering, at the University of California, Santa
Barbara.
We supported about 1500 users from two HP 9000 G30's, using one of
them as the centralized mailer. Mail was available via NFS exported
/usr/spool/mail to over 200 workstations, of all kinds: SGI, HP,
Sun, etc.
We replaced /bin/mail with procmail as the local mailer (Mlocal)
because procmail correctly avoided NFS-locking problems, and it
supported user-configurable mail filtering, without compromising
system security.
In over two years subsequent to the change, we had no loss of mail
due to procmail being used as the local mailer. If you wish further
comment from the current system managers, send mail to
"postmaster A T eci.ucsb.edu".
To answer your specific questions:
* you can configure the permissions directly, by changing one of the
following defines in config.h:
/* bit set on mailboxes when mail arrived */
#define UPDATE_MASK S_IXOTH
/* if found set */
#define OVERRIDE_MASK (S_IXUSR|S_ISUID|S_ISGID|S_ISVTX)
/* the permissions on the mailbox will be left untouched */
#define INIT_UMASK (S_IRWXG|S_IRWXO) /* == 077 */
#define GROUPW_UMASK (INIT_UMASK&~S_IRWXG) /* == 007 */ |
We did not find it necessary, however:
- We did disable all locking except dot-locking, since the kernel
locks were the source of the NFS-locking problems. There have
continued to be occasional locking problems, but these are
"victim"-induced problems caused by using non-supported and
discouraged mailers, such as "mailtool" from older Suns. These
locking problems have nothing to do with mail delivery, but from
the mail client using kernel-advisory locks, and then orphaning
them or, leaving them locked all day long.
- An alternative to having users use .forward files, is to create a
file of users who would like to use procmail as their local
delivery agent, and use this file to initialize a class
variable.
Write a special rule in sendmail.cf which delivers mail using
Mprocmail instead of Mlocal when the destination user is in the
special procmail user class.
This allows users who want procmail-direct delivery in spite of
management worrying.
I set this up to test procmail delivery on our system before
changing Mlocal to use procmail. We placed some "volunteer" users in
the procmail class file, and they never had any problems (I was one
of them).
19.8 The .forward file must be real file
http://www.math.fu-berlin.de/~guckes/mail/forwarding.html
...I tried to make a softlink to ~/.forward, but then my
procmail wouldn't run. When I made a real ~/.forward file, then it
worked again. My question is – why would procmail treat a link to a
file any differently than the actual file itself?
ln -s ~/.procmail/forward ~/.forward |
[Werner Reisberger <wr A T tribe.ping.de>] That's not a problem with
procmail, this is an MTA issue. Due to security reasons sendmail will
not deliver mail to files whicharesymlinks.
[david] procmail has restrictions on what permissions it will tolerate
on an rcfile. For example (I'm just guessing here) it can tell whether
it can read the target file but it cannot tell who might be able to
write to it. This prevents a major security hole
You can make hard link to the file, since A hard link is completely
indistinguishable from the original file. But note: a file hard-linked
to two or more names is very distinguishable from a file with only one
(hard) link, and procmail, for example, will not deliver to a plain
folder that has two or more hard links.
You can also put the real file at ~/.forward and let
~/.procmail/forward be a symlink to
[<mikk0022 A T maroon.tc.umn.edu>] I suppose, the reasoning behind
procmail's folder policy is that procmail locks the file by name, not
inode. Hence it cannot guarantee mutual exclusion for access to a file
which has multiple names.
My understanding of the .forward policy is that a symlink need not
share the permissions of its target. Therefore somebody's .forward
symlink could have proper permissions, while its target could be
writable by others. This would allow anybody with the write
permissions to execute any program (potentially) from the user's
forward file.
Two hard links share the same permission, so this argument doesn't
hold.
19.9 Using .forward if procmail already is LDA
[Elie Rosenblum <fnord A T jurai.net>] If you have a .forward, it is
used by sendmail to replace a call to the LDA for the user in
question. So if you have a .forward that doesn't call procmail,
procmail is ...
[david] Elie sent the answer to me with a carbon to the list, but
since reading my personal copy my inbox got trashed. As of this
writing the list copy hasn't reached me, but the rest of that
sentence (as I recall from reading it before it got hosed) was to
the effect that procmail is then never invoked at all on your
incoming mail; a .forward takes precedence over the LDA. That
scenario never occurred to me. Thank you for explaining.
[Philip] Scratch the bit about /etc/procmailrcs/$LOGNAME. You're
mixing up procmail -d with procmail -m.
Ah, got it ... after rereading the man page. The part about
/etc/procmailrcs really can apply only when procmail is setuid
root, so again it's something I've no experience with and never
quite followed or retained. So no file in /etc/procmailrcs is ever
used implicitly, but /etc/procmailrc can be.
[Philip] $HOME/.forward is handled by sendmail. If you have a
forward, then sendmail rewrites attempts to deliver to you into
attempts to deliver to the addresses listed in the .forward file.
Or in other words, the .forward takes precedence over the LDA.
Thank you both.
19.10 Mail should be put in the mailqueue if write fails
...We want to deliver directly to a user's home
directory. But this can of course be temporarily full. Then the
mail should not bounce, but instead be put back in the
mailqueue and tried again until either it succeeds or sendmail
bounces it after 5 days (as usual). The README file says this is
my choice (to bounce or not), but I cannot find any place where
I can set this. What is the correct place to set this behavior
[1998-06-24 PM-L phil] The -t flag causes procmail to
return EX_TEMPFAIL where it normally would have returned
EX_CANTCREAT. If you've made procmail the local delivery agent then
you should add -t to the A= define, before the -d flag.
19.11 Qmail: how to make it work with procmail
[1998-11-10 PM-L John Conover <conover A T inow.com>] All you
do is install fastforward and dot-forward, (they are optional, and
are not required.) Then cp /var/qmail/boot/proc or
/var/qmail/boot/proc+df, to /var/qmail/rc.
[1998-11-10 PM-L Greg Boes <gboes A T ashfordtech.com>] From the qmail
FAQ (4.4 How do I use procmail with qmail?) Put
into ~/.qmail. You'll have to use a full path for procmail unless
procmail in in the system's startup PATH. Note that procmail will
try to deliver to /var/spool/mail/$USER by default; to change this,
see INSTALL.mbox.
19.12 Qmail: Procmail looks file from /var/spool/mail only
...Procmail seems to want to do something in /var/spool/mail. But
since I use qmail, I don't have a /var/spool/mail. Is there a way
to have procmail not to create temp stuff there?
[philip] Get procmail 3.11pre7 and uncomment and and correct for your
local setup the MAILSPOOLHOME="/.mail" define in src/authenticate.c.
Compile and install. t's relative to the user's home directory.
Thus the name MAILSPOOLHOME.
[Ekkehard Knopp <knopp A T rz-online.de] at the qmail-home-page
you can find a patch for procmail-3.11.pre7 called
procmail-maildir-patch. When you can't find it, I can send you a
netmail. Have no problems with procmail and qmail. Works good.
19.13 Qmail: patch to procmail 3.11pre7 to work with Maildirs
[Jaye Mathisen <mrcpu A T cdsnet.net>] On the www.qmail.org page is a
patch that lets procmail 3.11pre7 work with Maildir's, (qmail's NFS
safe delivery format), and not must Mailbox's.
Very useful. Really slows down delivery though. On my test box,
just adding procmail to the delivery where all it did was deliver
to the default mailbox, and no other rules whacked my speed test
from something like 600,000 messages/day to about 180,000.
Killer. I suspect Procmail's locking of the Maildir 8 ways from
Sunday is probably partially to blame.
19.14 AFS: How to use Procmail when HOME is in AFS cell
...I've viewed some of the archived posts concerning AFS and
procmail, but each seems to have a different perspective on the
subject. Besides the fact that AFS isn't the greatest product in
the world, does everyone agree that it is not possible to use
procmail when your $HOME lies in an AFS cell? Mail sent locally
seems to work with procmail, but mail from users w/o a token or
AFS id just gets delivered to /var/spool/mail/someone.
[Christopher Lindsey <lindsey A T ncsa.uiuc.edu> 1998-03-09 PM-L] AFS
is awesome! You just have to treat it nicely. :) The only viable
solution that we've been able to come up with involves patching the
procmail-3.11pre7 sources to "fake" user home directories out of
another directory.
For example, my home directory in AFS is
/afs/ncsa.uiuc.edu/.u1/lindsey/ |
It is kept as such on the mail server in /etc/passwd as well.
However, we have some space set up via NFS in /var/forward with
space for each individual user (so /var/forward/lindsey in my
case).
The procmail patch intercepts requests for the user's home
directory and replaces it with the "fake" directory (the
/var/forward one). So for all practical purposes, procmail things
that my home directory is /var/forward/lindsey, and everything
works fine.
19.15 Help, some idiot sent my address to 30 mailing lists
You can make a procmail recipe to junk incoming mail from the
lists until you get the unsubscribe messages delivered to cancel
your participation. You should complain to the list's maintainer
that such things was even possible: The mailing list should have
sent you a confirmation message with unique "participate ID number"
that you need to send back in order for the subscription to take in effect.
KILL_FILE = $PMSRC/.kill-immediately
:0
*$ ? $IS_READABLE $KILL_FILE
{
KILL = `cat $KILL_FILE`
}
# 1) Make sure KILL has value
# 2) if match is found from header.
# 3) /dev/null does not need lockfile
:0
* KILL ?? [a-z]
*$ $KILL
/dev/null |
[sean] ...In the long haul, your best bet with dealing with this
problem is to stamp out the offender - bring this harassment to
the attention of their ISP and get their account closed. Repeat as
necessary. Most of the mailing lists should have some record of the
submission request. Even if forged, the abuser probably has their
IP address in the headers somewhere (and if the person is actively
subscribing your friend to so many lists and actually WORKING at
covering their tracks, apparently you've REALLY crossed them). Most
people who stoop to these immature harassment tactics aren't
bright enough to fully cover their tracks.
Another alternative to having to manually deal with unsubs on
certain lists is once you've identified filterable characteristics
of the lists, BOUNCE them. Most semi-intelligent listserv
implementations will unsub you if they get repeated bounces. Yea,
not nice to the listserv maintainer - but then, if perhaps they'd
implement a subscription verification system, it wouldn't have been
a problem to begin with.
:0
* condition
{
# may expose your .forward - but if you're bouncing lists,
# it probably doesn't matter much.
EXITCODE = 67
# save header for examination.
:0 h:
bounce.log
} |
You've got a sticky situation. You can't simply ditch all
unrecognized mail - you need to be able to review potential refuse
first, and take action on anything which doesn't belong (because
you certainly don't want to continue getting the non-wanted lists
till the end of eternity - you should want to unsubscribe from them
to simplify your mail).
19.16 Help, Procmail beeps and prints to my console
...when messages get filtered through procmail I get a beep and
then first 10 lines or so are also sent to the console. I get a
lot of messages so the beeps, and stuff on my screen is getting
very annoying.
[sean] One or the other should do the trick (or both even): Go to
your login file (what it is named depends on the shell you're
using), and add:
Or/also, in your .procmailrc add:
[manual] has information on the COMSAT variable. It also states
(contrary to reasoning I gave in above) that COMSAT defaults to
'no' if you specify an rc file on the commandline (otherwise, it is
on by default).
Doing this latter one should keep procmail from generating
COMSAT/BIFF notifications, but would still leave your shell capable
of receiving them, say, if you only processed certain mail in
procmail manually or some such. Personally, I turn biff off AND set
the COMSAT off. I read my mail when I read my mail, and I check it
often enough (with a POP client at that).
19.17 Help, procmail dumps mail to console
...I have installed sendmail and procmail on my linux machine
(latest version of slackware) it works ok, but procmail if run
with -d $u dumps all mail after receiving immediately on the
console with ---- more ---- I don't like it, a beep is ok, but I
do not want all the garbage on my screen. Is there a way to tell
procmail that I just want the mail in my mailbox
(/var/spool/mail/$u) ? Thanks for the help!
[Xavier Beaudouin <kiwi A T oav.net>] Check your /etc/inetd.conf for a
in.comstat, add a '#' at the beginning of the line, save the file
and killall -HUP inetd. This should stop this ;-)
19.18 Help, corrupted From_ line in mailbox
[Jeffrey S. Gilton <jeffg A T castlec.com>
1998-02-11 in procmail mailing list " Solved the FFrom problem"]
Thanks to everyone who responded to my questions about a problem where
the From line was getting corrupted. Here I tell what was the real
problem.
To recap, when our Caldera OpenLinux 1.1 system received multiple
mail messages very quickly, some messages would get multiple F's on
the from line and then subsequent messages would be missing the F's.
Most responses said that it sounded like a file locking problem.
Suggested solutions were to get the latest version of procmail or
recompile our version so that it would look at the file locking
mechanisms.
The funny thing was that three systems with new installs didn't
exhibit the problem.
The file locking recommendation eventually led to the real problem. On
a good system I would run our spam script (we spammed ourselves to
trigger the problem) and everything would work. Using top I would see
multiple instances of procmail running. Looking at the directory where
the spool files were, I would see a spool_file.lock file get created
and then go away.
Finally, I did the exact same thing on the system that wouldn't work.
There I would see the multiple instances of procmail running but no
lock file being created. I said to myself "Now that I know what is
happening, the question is why."
It turned out to be a permission problem on the spool directory. On
the system that worked, the permissions were rwxrwxr-x with the owner
being root and the group being mail. On the system that didn't work,
the permissions were rwxr-xr-x with the owner and group being root.
This meant that procmail, which is run as mail couldn't write the
directory file. We changed the broken system to rwxrwxr-x with owner
root and group mail. The problem disappeared.
As I said, the suggestions about lock files were key. It guided our
investigation until we found the real problem. I thank everyone who
responded.
I've seen other posting about corruption of the From line. Perhaps you
have the same problem.
[Christopher B. Smith <cbsmith A T envise.com>] I had the exact same
problem with my upgraded OpenLinux system. For the record, if you are
running the imapd that comes with it, you should really set your
permissions for the directory is as follows:
chmod 1777 /var/mail/spool |
I got that feedback from the guy who wrote imapd, and it works very
well.
19.19 Directing user's mail to HOME instead of /var/spool/
...I have a need to direct all a single user's mail to a mailbox in
his home directory, to $HOME/mailbox,
# One possible solution, not perfect
UHOME = /tmp_mnt/users
UHOME_LIST = "(login1|login2|login3)"
*$ ^TO\/$UHOME_LIST@
* MATCH ?? ()\/[^@]+
$UHOME/$MATCH |
[era] Perhaps preferably use ^TO_ if you have Procmail 3.11pre7
or newer. This is the classical case of using Procmail where you
really need the envelope recipient information. The headers are
not enough to determine who a message is for. If Procmail is your
MDA, you can have this, but I'd still think something involving
Sendmail would be more appropriate. For one thing, what if this
user would suddenly really want to use Procmail? You can set
DEFAULT and ORGMAIL for this one user in /etc/procmailrc to come
around that, but the bottom line, as so many times before, is that
Procmail might not be the right tool for this.
19.20 NFS mounting /var/mail is a good way to get bad performance
Procmail mailing list 1998-06
> /var/mail stays at a Solaris 2.5 machine. Cucipop is working
> at the same machine. It's fine there. But, I want to have
> more than one machine with cucipop and when I put cucipop at
> another machines, NFS clients, it is delaying more 30 or 40
> seconds to close the session. |
[1998-06-23 PM-L Brad Knowles <brad A T colltech.com>] NFS mounting
/var/mail is a good way to get bad performance, especially when
you're doing any NFS writes. Even if you're not doing any NFS
writes, just having to deal with local file locking and trying to
translate that into NFS file locking is a nightmare (in general,
file locking is one of the single biggest problems left with NFS).
> Procmail is working good on NFS, it finishes quickly. But when
> cucipop is put on a NFS client, procmails starts to delay too. |
Procmail probably isn't writing to NFS, or if it is, it's probably
not using the same locking mechanism as cucipop. Unfortunately,
each vendor and each program have their own ideas on how to best do
that.
[philip] cucipop was written by the author of procmail. Ideally,
when you compile cucipop you edit its config.h to use the locking
techniques that procmail's autoconf process determined for your
system(s). However, even if you didn't do that, cucipop uses the
same dotlocking algorithm as procmail.
Also, keep in mind that any POP3 server will have to copy the
mailbox in order to work on it, and many of them copy the mailbox
to /var/mail/.username (you got it – creating lots of NFS writes).
When they're done, they copy the mailbox back to /var/mail/username
(after they copy any new mail messages that have come in to the end
of /var/mail/.username and locked then truncated the original
/var/mail/username file).
[philip] cucipop doesn't use a temporary file: it keeps it all in
memory. On deletes it updates the mailspool in place which should
never lose data, though if the server crashes in the middle of
this you can end up with one or more bogus messages.
This is a real nightmare when you start talking about users who
select "Leave mail on server" and have multi-megabyte mailboxes.
[philip] Assuming you have enough memory, cucipop should be pretty
fast.
I think maybe now you're starting to understand why POP3 really
doesn't scale well at all in multi-machine environments (unless
you've cooked up a custom mail store that uses a real database
back-end, like Oracle Parallel Server), with /bin/mail (or
procmail) as a writable interface to this message store and POP3
and/or IMAP as a readable (and writable) interface to this same
message store. Then you can let the database vendors deal with the
hard data replication and distribution problems.
Otherwise, it's a pain-in-the-ass.
> Is there another good pop server? |
Have you tried QPopper from Qualcomm? It's the single best POP3
server I've ever run across, although I wouldn't put even it in an
NFS write environment.
BTW, I used to be the Mail Systems Administrator for GNN (Global
Network Navigator), the web site/National ISP co-operative between
O'Reilly & Assoc. and AOL. At our peak, we had hundreds of
thousands of registered users, of which up to five to six thousand
were logged in at any one time, with their MUA set to check their
mail every minute.
We had a single primary Mail/POP3 server machine (Dec Alpha 2100 w/
four 250Mhz processors, 4GB RAM, 28GB hardware mirrored/striped
mail spool), and one warm spare (same CPU/RAM configuration,
physically hooked up to the same disks, but through DECsafe ASE not
mounting them unless the primary died).
19.21 I can't see the sendmail's response in LOGFILE
...As the man page says, this should've written to my LOGFILE. It
didn't. But it DID activate the pipe in the recipe. So what's up
here?
:0 hc
*$ ? $IS_EXIST $HOME/.vacation
| LOG=| ($FORMAIL -r; echo $IM_NOT_HERE) | $SENDMAIL -t |
[david] The man page says that a variable capture recipe assigns the
standard output of the command to the variable. Since you are repiping
the output of formail and echo to sendmail, sendmail sucks up the
standard output of formail and sendmail. Sendmail itself does not
write to standard output, so the stdout of ( $FORMAIL -r ; echo
$IM_NOT_HERE ) | $SENDMAIL -t is nothing.
Thus you're assigning a null string to $LOG, and when procmail writes
$LOG to the logfile you can't see a difference.
19.22 Compiling procmail and choosing locking scheme
General advice: Everything except dot locking is usually broken.
[stephen, <199607292139.XAA12433@hera.cuci.nl>]. Remove fcntl() and
lockf(), only allow flock() (or omit it completely) Kernel locks
don't work. But that's all some programs use. Across a networked
filesystem, lockf() doesn't work, fcntl() and flock() should, but
they don't either because the lockd is buggy. Mailtool uses fcntl()
but does it wrong, so that's another problem. The only thing that
works on all platforms, all networks, all the time are .lock files.
Makefile refers to:
# Uncomment (and change) if you think you know
#LOCKINGTEST=100
# it better than the autoconf lockingtests.
# This will cause the lockingtests to be hotwired.
# 100 to enable fcntl()
# 010 to enable lockf()
# 001 to enable flock()
# Or them together to get the desired combination. |
config.h refers to:
/*#define NO_fcntl_LOCK uncomment any of these three if you */
/*#define NO_lockf_LOCK definitely do not want procmail to make */
/*#define NO_flock_LOCK use of those kernel-locking methods */ |
19.23 Forwarding lot of mail causes heavy load
...There are several forward (e.g. ! walter@localhost) recipes
For every forwarded mail, a distinct sendmail process is created.
This leads to a heavy (IMHO unbearable) system load. How can I
stop procmail from running a sendmail process for every mail
forwarded?
SUMMARY: Look at qmail, it's better than sendmail.
[era 1998-08-15 PM-L] (Blows dust off old underutilized Bat
Book/ORA sendmail book) Yeah, setting QueueFactor (q) and QueueLA
(x) to suitable values should do what you want. You need to have
load-balancing support compiled in, though; according to the Bat
Book, sendmail -d3.1 tells whether you have it or not. (Mine just
says getla:0 which I would imagine means I have the support but the
load average was below the cutoff level.
AFAIK using load averaging would have the first messages
delivered and the rest queued. However, also not being a sendmail
guru, I do not know how to empty a sendmail queue for incoming
mail only. Moreover, even if I knew how to do this, it would have
to be done after procmail finishes.
[Liviu Daia <daia A T stoilow.imar.ro>] Instruct sendmail to queue
messages when called from procmail:
SENDMAILFLAGS="-oi -od d" |
then disable the normal sendmail daemon from your system init
scripts, and run it in flush queue mode only, that is, replace
/usr/sbin/sendmail -bd -q 15m |
in your init scripts with
/usr/sbin/sendmail -q 15m |
("15m" is how often the queue will be run (15 minutes).
Change it to whatever is appropriate
for your purposes). Also make sure to disable forking in your
sendmail.cf.
The downside of this approach is that it will also delay the
delivery of local messages.
Different approach: pipe messages to sendmail instead of using
'!' and use the wait flag. Something along the lines of:
:0 w
* conditions
| $SENDMAIL $SENDMAILFLAGS <recipients> |
Well, I'm actually not sure you can use the 'w' flag without 'f'
(the manual doesn't say it, and I'm not too familiar with procmail
internals), so if that doesn't work you might also try Sendmail
will rewrite the From_ header (which you can probably safely
ignore), and it will (optionally) add a From: if one doesn't
exist, but it won't touch an existing From:. Well, actually it
will encode or decode any 8-bit characters in the From: according
to the options in sendmail.cf, but it won't change the meaning of
the "From:". In fact, that's exactly what procmail does too in the
'!' recipes.
:0 fw
* conditions
| $SENDMAIL $SENDMAILFLAGS <recipients>
# dummy recipe to stop procmail from delivering an empty message
:0
a /dev/null |
19.24 What happens to mail if MDA Procmail fails
...When procmail is the local mailing agent distributing
e-mail to a user's $HOME and the target machine is 'down', where
does the e-mail go? I was given the impression that the mail
would be collected on the 'mailhub' in /usr/mail/BOGUS.xxx
(Solaris system). It is not happening and we have the potential
of losing mail.
[philip] I assume that by "target machine" you mean the NFS
server for the given user's account. Procmail's attempt to read
~/.procmailrc will timeout, then when it tries to write to $DEFAULT
(which you say is in their home directory) it'll time out (again)
and return EX_CANTCREAT to sendmail. Sendmail will then presumably
bounce the message.
Now, if sendmail is looking for .forward files in user home
directories, then procmail will never be called, as sendmail will
try to open the .forward file and consider it a transient error
when it times out, causing the message to be queued for a later
delivery attempt.
(Note: invoking procmail with the -t flag causes it to return
EX_TEMPFAIL instead of EX_CANTCREAT. This would cause the message
to be requeued. However, this is not generally recommended.)
19.25 Procmail reads entire 90Mb message into memory
...last week my workstation ground to a halt when procmail received a
90Mb Email message (ran out of memory). The point is, such
message sizes are fine by me, as long as the system can handle
it. Is there any way I could make procmail only read the headers
of that message before scanning /etc/procmailrc/ ~/.procmailrc and
acting on it? That way it wouldn't need to read the entire
message into memory.
...Recently, I modified the sendmail.cf file to pipe messages
through procmail before sending them to deliver, so that I can
use system-wide procmail recipes for spam filtering. However,
yesterday we had a client send a 22 megabyte e-mail message (on
purpose, no less) and the system just came to its knees trying to
deliver it to the user's mailbox.
[philip] Btw, All the versions of /bin/mail (or mail.local) that
I've seen the source for either read the entire message into memory
first or use a temp file. Depending on where temp files are
located, a 90MB temp file may be just as bad as holding it in
memory.
And, No, there isn't. Hacking it in would not be non-trivial, mainly
because the current code runs with the assumption that the entire
message is there, and determining when it actually needs to see the
entire body (to do demand loading) would not be easy. Remember
that a condition on the size of the message, ala
:0
* > 10000000
/dev/null |
would require the body to be read... It really is just better to
simply have sendmail enforce the limit. You should be doing it
there anyway to cut down on the totally trivial denial-of-service
attacks and because it's more efficient.
...I am running procmail ver 3.11pre7 and I keep getting
"out of memory as i tried to allocate 8xxxxxx bytes.". I
have over 100 meg available swap space so i have a difficult
time understanding this. Is this a known error?
Procmail's memory allocation technique appears to non-optimal for
some OS/libc combos, namely implementation of the libc system
function realloc() (FreeBSD has been reported). It's conceivable
that the configuration process could be enhanced to detect this
system limitation to use a strategy more efficient on them. Don't
hold your breath.
19.26 Procmail signaled out of memory in my verbose log
...I notice in my procmail verbose log the following
'transaction':
procmail: [10239] Sat Jan 9 08:49:02 1999
procmail: Out of memory
buffer 0: "formail"
buffer 1: " formail -A "X-Check: List""
Folder: **Bounced** 5744
procmail: Notified comsat: "bhoule@:**Bounced**" |
If I act quick enough when this happens, I can look in
spool/mqueue and find a message with a gazillion addresses in the
To: line. So it seems that formail is having trouble adding my
X-Check header to an already large set of headers.
[philip] No, it's procmail that's unable to allocate enough memory.
The buffer dumps indicate that procmail was unable to get enough
memory somewhere between parsing the action line and reaching the
next recipe – buffer 0 would not contain the string "formail" if
procmail had gotten to another recipe or variable assignment.
What's weird is that the message is so small (only 5744 bytes
according to procmail). Do you only see this error on this recipe,
or at random places in your .procmailrc? If the later, then I would
guess that your mailserver is running out of memory for some other
reason and that procmail happens to be an innocent bystander. If
the former, then, well, I'm not sure.
The message is never delivered to me. Is there anything I can do
so that procmail/formail will act as if it was never there so the
incoming dumps into my inbox rather than returning an error to
the mailer? This "*Bounced*" business is not a very helpful
action.
Giving procmail the -t flag will cause fatal internal errors that
are normally returned as permanent errors to be returned as
temporary failures instead. Otherwise there's no way to control
that. (Setting EXITCODE won't work because procmail needs to malloc
memory to handle TRAP and EXITCODE, and it'll refuse to try that
when it was malloc that caused the exit.)
19.27 Variables DEFAULT and ORGMAIL
...According to the man pages, DEFAULT is defined as ORGMAIL
...so if I redefine ORGMAIL, then DEFAULT should change as
well, which doesn't help me. Any help on this would be
appreciated
[david] DEFAULT is initially defined as equal to ORGMAIL. Once
procmail has started reading /etc/procmailrc (if it is the MDA) or
your .procmailrc, you can change the value of either without affecting
the other.
In fact, you can even set DEFAULT on the command line when you invoke
procmail (I'm not sure about doing that with ORGMAIL, though), and
that value will override its normal initial value equal to ORGMAIL.
What if it is possible that dropping to DEFAULT fails due to disk full?
Then you would better have another drop place in another file system.
Peek at bdf(1) or df(1) to find out the different mounted file systems.
# Place this to the end of your .procmailrc and define
# DEFAULT_SECONDARY
:0 :
$DEFAULT
:0 E
$DEFAULT_SECONDARY |
If you deliver explicitly to $DEFAULT, procmail treats it like any
other save-to-folder recipe, and if the write fails, it continues
reading recipes.
...If I had set the "deliver" destination as ORGMAIL rather than
DEFAULT, would it have made any difference?
Nope. If you write a recipe for it, procmail just expands the variable
and doesn't give a heck if it happens to be the same destination as
DEFAULT or ORGMAIL. DEFAULT is special to procmail only when it
uses it on its own after falling off the end of the rcfile; ORGMAIL
is special only at startup (without -m) and when procmail falls off
the end of the rcfile and finds that it cannot save the message to
DEFAULT.
In general, if procmail falls off the end of the rcfile,
fails to save to DEFAULT, and then fails to save to ORGMAIL,
does it revert to the compiled-in value of ORGMAIL ?
[philip] Procmail has no fallback beyond the current value of
ORGMAIL. If delivery to both DEFAULT and ORGMAIL fail, then
procmail gives up and exits with error code 73 (EX_CANTCREAT) or 75
(EX_TEMPFAIL), depending on whether the -t flag was given. Setting
EXITCODE would probably override those. The message is logged
as "*Bounced*".
19.28 When DEFAULT cannot be mailed to
If procmail gets to the end of the rcfile without delivery (or without
being directed to another rcfile by an INCLUDERC or HOST assignment),
it assumes these:
:0:
$DEFAULT
:0 e:
$ORGMAIL |
That is, it tries to deliver to $DEFAULT and if it can't, it tries
$ORGMAIL. If that fails too ("deep, deep trouble" as Stephen says in
the man page), it exits without delivery and reports failure to the
MTA, which, depending on other factors, will either requeue the letter
and try delivering later or will bounce it to the sender.
19.29 Variable DROPPRIVS
...I have procmail invoked from a mailtable for a virtual domain.
Presently that runs as root, inherited from sendmail. I'd like to
have it run less privileged. I tried chown'ing the rc file to the
user I want used and setting "DROPPRIVS=yes". That didn't do it.
So I added "LOGNAME=user" and "USER=$LOGNAME" before the
DROPPRIVS assignment and that didn't work.
[philip] DROPPRIVS only has an effect inside the /etc/procmailrc used
when procmail is running in delivery mode (-d), not when it's
running in mail filter mode (-m). USER and LOGNAME have no effect on
the working of DROPPRIVS, as procmail is just going to change to
the uid/gid of the user specified on the command line after the -d.
Your mailtable entry should be specifying the procmail mailer,
which runs procmail in mail filter mode.
If the following are true:
- procmail is running in mail filter mode
- no assignments were given on the command line
- the -p flag was not specified
- the rcfile specified is located under /etc/procmailrcs/ without
backwards references ("/../"s)
- the rcfile is not a directory (duh!)
then procmail will assume the uid and gid of the owner of the
rcfile. If the rcfile is actually a symlink, the procmail will
assume the uid and gid of the link itself, not the underlying file.
If your OS allows anyone to give away ownership of files with
chown, the procmail adds the following restriction to those above:
/etc/procmailrcs must be owned by root and mode 700. |
19.30 Variable HOME
[david] Since procmail doesn't understand tilde, you have to use
variable HOME instead.
CONTENT = `cat ~/file.txt` # Won't work
CONTENT = `cat $HOME/file.txt` # ok |
But accessing other user's home is another story. You could change
the SHELL temporarily to get procmail understand the reference,
like this:
SHELL = /bin/csh
CONTENT = `cat ~user/file.txt`
SHELL = /bin/sh # |