By Peter Olson

2011-10-19 16:56:59 8 Comments

I saw a line of C that looked like this:

!ErrorHasOccured() ??!??! HandleError();

It compiled correctly and seems to run ok. It seems like it's checking if an error has occurred, and if it has, it handles it. But I'm not really sure what it's actually doing or how it's doing it. It does look like the programmer is trying express their feelings about errors.

I have never seen the ??!??! before in any programming language, and I can't find documentation for it anywhere. (Google doesn't help with search terms like ??!??!). What does it do and how does the code sample work?


@Dimitris Fasarakis Hilliard 2016-03-25 02:24:53

As already stated ??!??! is essentially two trigraphs (??! and ??! again) mushed together that get replaced-translated to ||, i.e the logical OR, by the preprocessor.

The following table containing every trigraph should help disambiguate alternate trigraph combinations:

Trigraph   Replaces

??(        [
??)        ]
??<        {
??>        }
??/        \
??'        ^
??=        #
??!        |
??-        ~

Source: C: A Reference Manual 5th Edition

So a trigraph that looks like ??(??) will eventually map to [], ??(??)??(??) will get replaced by [][] and so on, you get the idea.

Since trigraphs are substituted during preprocessing you could use cpp to get a view of the output yourself, using a silly trigr.c program:

void main(){ const char *s = "??!??!"; } 

and processing it with:

cpp -trigraphs trigr.c 

You'll get a console output of

void main(){ const char *s = "||"; }

As you can notice, the option -trigraphs must be specified or else cpp will issue a warning; this indicates how trigraphs are a thing of the past and of no modern value other than confusing people who might bump into them.

As for the rationale behind the introduction of trigraphs, it is better understood when looking at the history section of ISO/IEC 646:

ISO/IEC 646 and its predecessor ASCII (ANSI X3.4) largely endorsed existing practice regarding character encodings in the telecommunications industry.

As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones.

(emphasis mine)

So, in essence, some needed characters (those for which a trigraph exists) were replaced in certain national variants. This leads to the alternate representation using trigraphs comprised of characters that other variants still had around.

@Joel Falcou 2011-10-19 16:58:56

It's a C trigraph. ??! is |, so ??!??! is the operator ||

@Joel Falcou 2017-03-23 19:45:01

trigraph come from a period where some keyboard didnt have all the keys they have now. It also hels when some text editor reserved special characters for special things. It's mostly a relic of the past and a quizz enabler ;)

@Owl 2019-01-11 18:06:37

Because some keyboards apparently don't have "|" so some people have no option but to headbutt the keyboard repeatedly until a trigraph occurs that gives them the symbols they need.

@David R Tribble 2019-10-25 21:08:14

And then there is the <iso646.h> header file.

@DigitalRoss 2011-10-19 21:09:06

Well, why this exists in general is probably different than why it exists in your example.

It all started half a century ago with repurposing hardcopy communication terminals as computer user interfaces. In the initial Unix and C era that was the ASR-33 Teletype.

This device was slow (10 cps) and noisy and ugly and its view of the ASCII character set ended at 0x5f, so it had (look closely at the pic) none of the keys:

{ | } ~ 

The trigraphs were defined to fix a specific problem. The idea was that C programs could use the ASCII subset found on the ASR-33 and in other environments missing the high ASCII values.

Your example is actually two of ??!, each meaning |, so the result is ||.

However, people writing C code almost by definition had modern equipment,1 so my guess is: someone showing off or amusing themself, leaving a kind of Easter egg in the code for you to find.

It sure worked, it led to a wildly popular SO question.

ASR-33 Teletype

                                            ASR-33 Teletype

1. For that matter, the trigraphs were invented by the ANSI committee, which first met after C become a runaway success, so none of the original C code or coders would have used them.

@Steve314 2011-10-20 04:06:25

It's not the only case of missing characters, in the keyboard and the character set. The Commodore 64 is likely to be more familiar to a lot of people in their late thirties and upwards - the displayed character sets both lacked braces (and probably the bar and tilde too) - in this case because the "ASCII" wasn't ASCII. In ECMA-6 (almost always called ASCII, but not US-ASCII) there were 18 region-specific codes, but I don't know which codes they were. The one thing I can say for sure - in the British "ASCII", # was replaced with £. In other regions, maybe "ASCII" had no braces etc.

@dan04 2011-10-20 06:16:26

The similar ATASCII character set for Atari 8-bit computers also lacked { } as well as ~ and `.

@Ilmari Karonen 2011-10-20 13:36:24

See these two Wikipedia articles. I'm just about old enough to still remember the era of 7-bit national charsets (although I'm sure they still linger on in some dark unswept corners), and the book I first learned C from found it necessary to warn about the possibility of if (x || y) { a[i] = '\0'; } looking like if (x öö y) ä aÄiÅ = 'Ö0'; å in the wrong charset.

@Karoly Horvath 2011-10-21 22:26:20

@Steve314: well, on C64 you could rewrite the character-set so you can use whatever encoding you like :)

@DigitalRoss 2011-10-26 02:45:27

Another interesting historical note is that Unix (which was the big platform C rode in on) may have been the first system of any significance (and maybe the first overall) to default alphabetic values to lower case rather than upper case. Although I haven't seen with my own eyes many contemporary systems, I think this was a real sign of sophistication. Besides being really the only decent OS, Unix also converted your upper case to lower, rather than vice versa. Those guys were really cool.

@Phil Perry 2014-04-11 18:25:15

Funny story I gotta tell ya... the IBM RS/6000 workstation's XL Fortran compiler was developed from the XL C compiler. In the first few releases, they accidentally left in the trigraph processing, so there were some legit Fortran character sequences (in a literal string, IIRC) that were misinterpreted as C trigraphs, leading to some interesting bugs!

@supercat 2014-06-09 16:57:38

@DigitalRoss: I know the Commodore came later, but whether upper- or lower-case was the default depended upon which character set was selected. Further, the relationship between screen character codes and the code used by the "getch/putch" equivalent methods in the Kernel was rather bizarre. Apple computer's putch equivalent used screen codes, which had bit 7 set for normal text. Further, until the 80-column card was released for the //e, even machines which could show lowercase could only do so in non-inverse text.

@psmears 2015-10-26 10:23:04

@Steve314: In case you want to refresh your memory on the exact codes, the ECMA-6 standard is available online :)

@Gus 2016-07-28 20:00:24

Such a nice answer but it sure looks like the key to the right of 0 is actually a | key (with * above)...

@DigitalRoss 2016-07-28 20:29:37

@Gus -- that's actually a : -- better pic here:

@supercat 2017-10-25 21:22:06

Trigraphs are horrid. The only character which needs to be synthesizable within quoted strings is the backslash ("meta") character, and that could have been accommodated by saying that if the first line of a file contains only one character, and that character is either a backslash or is not in the C character set, the compiler should treat that character as the meta character from then on. All other characters could then be made available as "backslash" escapes. Compilers that don't know anything about this feature would simply ignore a line that contains a backslash and nothing else.

@Lundin 2018-11-21 15:15:58

The anecdote I've heard from someone who was on the C90 committee is that by the time of ISO standardization, they still couldn't exclude trigraphs. The issue was that some keyboards for some languages (for example Nordic languages and German) were still missing various exotic symbols like |, { etc, sacrificed in favour for having keys to all letters of their alphabets.

@user786653 2011-10-19 16:58:44

??! is a trigraph that translates to |. So it says:

!ErrorHasOccured() || HandleError();

which, due to short circuiting, is equivalent to:

if (ErrorHasOccured())

Guru of the Week (deals with C++ but relevant here), where I picked this up.

Possible origin of trigraphs or as @DwB points out in the comments it's more likely due to EBCDIC being difficult (again). This discussion on the IBM developerworks board seems to support that theory.

From ISO/IEC 9899:1999 §, footnote 12 (h/t @Random832):

The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.

@Martin Beckett 2011-10-19 17:02:49

Trigraphs originally were needed in case you keyboard didn't have eg a '|' symbol. Here it's either the programmer deliberately being annoying or some bizarre editor 'feature'

@Peter Olson 2011-10-19 17:02:52

So it's relying on || short-circuiting or something?

@user786653 2011-10-19 17:05:00

Yeah, it's equivalent to if (ErrorHasOccured()) HandleError(). Thankfully you usually only encounter this idiom in perl code.

@Brian Roach 2011-10-19 17:05:06

@PeterOlson - correct. If !ErrorHasOccured() resolves to true then it short circuits, otherwise HandleError() is then called.

@DwB 2011-10-19 17:23:49

Trigraphs were added for EBCDIC computers.

@Steve Jessop 2011-10-19 17:49:09

@user786653: even in Perl, surely that's what unless is for?

@user786653 2011-10-19 17:52:42

@SteveJessop: I wasn't trying to endorse it. It's just where I've most often seen it, e.g. open(...) or die(...).

@Random832 2011-10-19 18:01:24

It's not necessarily EBCDIC - the set of characters that require trigraphs almost exactly matches the set of characters that are not invariant in ISO-646 (i.e. the old 'national ascii' standards).

@ninjalj 2011-10-19 18:10:14

@Random832: the standard has a footnote saying: The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.

@starblue 2011-10-20 06:57:22

German and Swedish variants had the letter 'ö' in that code position.

@Yam Marcovic 2011-10-24 15:01:22

A perfectly readable alternative would be ErrorHasOccurred() && HandleError(); That is, if you're used to shell scripting. :)

@Ben Lee 2011-10-25 19:21:35

What Yam said (only in my case, replace "shell" with "ruby").

@Ben Strawson 2011-10-25 21:47:41

What is to stop the compiler deciding that the call to HandleError() should execute first?

@Puppy 2012-08-08 18:01:18

Boolean operators are strictly evaluated LTR, IIRC.

@Luciano 2015-04-15 15:35:29

Just note that many coding standards specifically ban the use of Trigraphs and Digraphs, and many compilers & static analyzers will flag their use.

@SparkyRobinson 2015-04-15 18:06:56

I still can't read that: !ErrorHasOccured() || HandleError(); - How is that read in normal english? 'If there's an error OR handle error?' - if there is and error or you can handle an error do it?

@Omar Antolín-Camarena 2015-04-15 18:09:17

Read it as "Either no ErrorHasOcurred or you must HandleError", @SparkyRobinson.

@Peter R 2015-04-15 22:10:46

@user786653 in perl it would be more expressive to write: handleError() if errorHasOccured() or something like handleError() unless isSuccess()

@user3125367 2015-04-17 15:27:58

@SparkyRobinson It reads as it states: (error not occurred) or (handle error).

@1737973 2015-06-17 19:16:27

man cobfusc on digraphs: The amendment 1 (1994) changes to ANSI X3.159-1989 ("ANSI C89") are supported only by few C compilers (BSD July 1, 2001).

@val says Reinstate Monica 2018-12-15 17:12:20

Not valid since C++17 :|

@mckenzm 2019-04-21 06:20:49

C++ is not C. Even if C is valid C++. It's not just that the pipe may not be on the keyboard, 3270 terminals have 2 pipes, and the there is a risk that either will be used when passed back from ASCII land. As an aside, there are other character substitutions, and I coded C on a Cyber having to use # for quotes early on in my experience.

@David R Tribble 2019-10-25 21:06:43

Of course, you could just #include <iso646.h> and use the or operator macro.

Related Questions

Sponsored Content

18 Answered Questions

[SOLVED] Reference — What does this symbol mean in PHP?

23 Answered Questions

[SOLVED] Does Python have a ternary conditional operator?

22 Answered Questions

[SOLVED] What is the "-->" operator in C++?

27 Answered Questions

36 Answered Questions

[SOLVED] What is the !! (not not) operator in JavaScript?

  • 2009-04-24 08:13:58
  • Hexagon Theory
  • 526290 View
  • 2995 Score
  • 36 Answer
  • Tags:   javascript operators

11 Answered Questions

[SOLVED] Is there a "null coalescing" operator in JavaScript?

9 Answered Questions

11 Answered Questions

7 Answered Questions

Sponsored Content