3. Parsing CTCP Messages

A CTCP message is:

CTCP-MESSAGE	= CTCP-MARKER CTCP-TAG *( SP CTCP-TAG ) CTCP-MARKER
where:
CTCP-MARKER	= <US-ASCII character 1 (octal 001)>
CTCP-PERMITTED	= <any octet except NUL, CTCP-MARKER, LF, CR and SP>
CTCP-TAG	= 1*CTCP-PERMITTED

There are two types of CTCP messages: "request" and "reply". Generally, a request is generated by the user, and a reply is the automated answer that a remote client sends back. For simplicity, a request and its reply use the same keyword, encoded in different types of IRC messages. Any CTCP message encoded in an IRC PRIVMSG is a request. Any CTCP message encoded in an IRC NOTICE is a reply.

On non-IRC mediums (such as DCC CHAT), a reply may be distinguished from a request by preceding the keyword with a forward slash ("/"). For example, over a DCC CHAT connection, a CTCP "PING" request would be answered with a CTCP "/PING". This is only valid for non-IRC connections. On IRC, the PRIVMSG/NOTICE method of distinguishing a request from a reply prevails.

The first CTCP-TAG is the keyword that identifies the request or reply. The other tags are parameters which are specific to that request. The keyword is case-sensitive. All existing keywords are made up entirely of upper-case letters. The use of lower-case letters in a CTCP keyword is reserved.

While many existing clients send CTCP messages in separate IRC messages, with no other text (the entire message is a CTCP message), this is not required. A client must be able to decode CTCP messages embedded within normal text. The rule is: The 1st, 3rd, etc (odd numbers) CTCP-MARKER within a message starts a CTCP message, and the 2nd, 4th, etc (even numbers) CTCP-MARKER stops a CTCP message. A message with an odd number of CTCP-MARKERs is invalid.

Example: (^A is used to represent the CTCP-MARKER)

Hello Ja^APING 34^Ane! How's the we^AVERSION^Aather?

There are two CTCP messages encoded: "PING 34" and "VERSION". The text (after extracting the CTCP messages) is "Hello Jane! How's the weather?"

This is an extreme case of CTCP embedding, but all CTCP-capable clients are expected to be able to handle it.

IRC traffic containing multiple valid CTCP messages should be processed in a left to right order. Requests generating replies may or may not be combined into a single message, but must be returned in the same order as the requests were processed.

3.1 Quoting

There is a set of octets which the IRC protocol is incapable of sending in a normal PRIVMSG or NOTICE message. Because of this, IRC is not considered "8-bit clean". In order to provide for 8-bit clean CTCP messages, quoting is introduced to translate some characters on transmission and receipt -- just as an email containing binary data requires MIME, IRC messages containing binary data require CTCP. Quoting is also needed for certain characters (such as the CTCP marker) which are used by CTCP.

Listed below are the characters which require quoting, as well as their quoted equivalents.

                      ASCII             Quoted
Name                  (Octal)   Quoted  (Octal)
--------------------  -------   ------  -------
NUL  (null)           000       '\0'    134 060
STX  (ctcp marker)    001       '\1'    134 061
LF   (newline)        012       '\n'    134 156
CR   (newline)        015       '\r'    134 162
SPC  (space)          040       '\@'    134 100
\    (backslash)      134       '\\'    134 134

The original CTCP protocol defined a two-level method of quoting, which was rarely (if ever) implemented. Because few clients currently support this quoting method, and it is generally considered too complex, we have scrapped the original CTCP quoting.

The CTCP2 protocol introduces quoting at the argument level -- that is, the characters in an argument to a CTCP message are translated according to quoting rules before they are combined into a full CTCP message. Below is the order used in parsing incoming messages. This will be involved in the parsing of CTCP messages embedded within PRIVMSG, NOTICE, and DCC CHAT traffic.

  1. Break the message into tokens using SP.
  2. Interpret the first token as the CTCP keyword.
  3. Unquote each following token using the quoting table above. (For efficiency reasons, the keyword token does not need to be dequoted, since none of the current valid keyword tokens contain characters that will need dequoting.)
  4. Process the CTCP message according to its keyword. Likewise, quoting should be applied to all arguments before being concatenated into a CTCP request. However, none of the current CTCP keywords contain characters that need quoting, so an implementation may skip the quoting step for the keyword token.

3.2 "Flooding" Policy

It is not required that an IRC client respond to more than one CTCP request per IRC message, but if multiple requests are answered, the replies must be sent in the same order as the requests arrived. RFC 1459 indicates that the servers will contain flood control mechanisms, which will disconnect clients that send excessive amounts of text to their server within short periods of time. Given that this method is used by less desirable elements of the IRC community to take over channels, gain access to otherwise used nicknames and to disrupt the pursuit of enjoyment, each client must take steps to avoid this outcome.

For this reason, CTCP requests may be ignored by the client, based on selected criteria. Each client must determine, for the benefit of its users, what means it will provide to ensure that excessive text is not sent to the server. One popular method is to send only N CTCP reply messages per M seconds (where the ratio M/N varies dramatically, but should probably be at least 2). It is also permissible to reject requests from users who have sent "too many" (for any definition of "too many") requests, or users whom the client has been instructed to ignore.

Maintained by: Webmaster, Innovative Logic Corp..
Last modified: 17-Sep-2004 01:04PM.
Last modified: 17-Sep-2004 01:04PM.