ASCII – Origins

BareNakedCoder · on Nov 29, 2020

IMHO, the biggest RTFM-Fail ever is related to ASCII. Think of all the errors made and time lost fixing issues with CSV (comma separated value) or TSV (tab separated files) or pipe-delimited fields in records, etc, etc due to having to handle the case where the delimiter character is not a delimiter character but valid data with a field.

ASCII provided characters specifically for this purpose. But no one seems to have RTFMed. ASCII 31 is a "unit separator" (or field separator as we'd call it today) and ASCII 30 is a record separator. There's even ASCII 29, a group separator, so you can have a set of records related as a group (for example, a group of records of different type but related to a single customer). And there's ASCII 28, a file separator, so you can have multiple "logical" files within one physical file.

xyzzy_plugh · on Nov 29, 2020

What if you want to embed a table using the field separator in a table using the field separator?

There is no fail here, this is what escaping is for. And with escaping, you can nest infinitely.

codeflo · on Nov 30, 2020

I think that's the unfortunate consequence of people without a basic CS background inventing ad-hoc data formats. I recently had to deal with a kind of "nested CSV", where each level of nesting introduced a new delimiter! Who needs recursion?

rabidrat · on Nov 30, 2020

Escaping drastically complicates parsing (though not as much as quoting, thank god). But you seldom need to nest tables within tables. Much more frequently, table values want to contain embedded newlines.

Arnavion · on Nov 30, 2020

A general-purpose parser has to be able to handle the possibility of nested tables, so it has to account for escaping. And once you have an escaping-aware parser the choice of delimiter is no longer critical.

rabidrat · on Nov 30, 2020

Escaping is an unnecessary kludge in 99% of cases. If you need nested tables, then you need a special-purpose format that allows for it--and it will very likely look more like a binary filesystem than a parseable text file. If you don't need nested tables, then escaping is unnecessary if the format uses delimiters which aren't present in the data. A lot of data includes newlines, some data includes tab characters, but only binary data would ever include the ASCII specified delimiters.

Don't get caught in the CS trap of thinking you need a completely "general purpose" parser, when a slightly-less-than-completely-general parser will be significantly easier to write, simpler to understand and debug, and faster to execute. And as mentioned above, if you do find yourself needing more than this parser, then you should be looking at non-parsing based approaches anyway.

acqq · on Nov 30, 2020

Luckily people who designed programming languages had a better idea: introducing some begin and end markers can allow nesting without escaping them. Here I have two blocks nested inside of the outer block but I don't have to escape anything: { { } { } }

h2odragon · on Nov 30, 2020

Some financial protocols use those in anger still. Credit card auths and such. There were some serial things that used them as intended too, transfer protocols etc.

iirc "Relia COBOL" back in the early days of DOS also had a format that used the delimiters and all properly; but then did goofy things like globally translating "'" (apostrophe) to "*" (asterisk) anyway.

tinus_hn · on Nov 30, 2020

Even worse, in CSV the chosen separator may, depending on locale, also be used in number values which do not need quoting.

emptybits · on Nov 29, 2020

Text of the 1968 memorandum by President Lyndon Johnson mandating ASCII as the federal standard.

[1] https://www.presidency.ucsb.edu/documents/memorandum-approvi...

[2] https://web.archive.org/web/20070914121230/http://www.presid...

eesmith · on Nov 29, 2020

> In May 1961, an IBM engineer, Bob Bemer, sent a proposal to the American National Standards Institute (ANSI) to develop a single code for computer communication. ANSI created the X3.4 Committee,

Hmm. That seems to leave out quite a bit.

Eric Fischer (an HN'er) wrote "The Evolution of Character Codes, 1874-1968", available from https://archive.org/details/enf-ascii/page/n23/mode/2up .

In it, Eric writes:

> By 1955, Herbert Grosch had become sufficiently concerned about the growing incompatibility of character codes that he urged the attendees of the Eastern Joint Computer Conference to ‘‘register common codes so that ...

and

> The American Standards Association (ASA) got involved in character code standardization on August 4, 1960, when it created the X3.2 subcommittee for Coded Character Sets and Data Format.

Bemer's letter in 1961 therefore doesn't seem to have been a proposal to develop a single code, because that was in the works. Rather:

> The March 8-9, 1961 meeting of X3.2 finally led to a code (based on a proposal by Robert W. Bemer, Howard J. Smith, and F.A. Williams) that nearly everyone could agree upon—but there is some dis-agreement about exactly what it was that was agreed.

And that's in March 1961, well before May 1961.

Indeed, May 1961 is when Bemer et al. wrote the CACM article "Design of an Improved Transmission/Data Processing Code" - http://www.ed-thelen.org/comp-hist/ImprovedDataProcessingCod... .

rootbear · on Nov 29, 2020

The Univac 1100 computers I used in college used the Fieldata character set. I only recently learned that Fieldata was the name of a whole program of the US Military in the late 1950s to create a standard for military and battlefield information processing. The Fieldata character code was developed for that project. I've read that Fieldata was one of the major influences on ASCII.

zokier · on Nov 29, 2020

The cardinal sin of ASCII was to mix in TTY control plane with the character data in one monolithic standard, instead of layering the two more cleanly allowing us to swap in more appropriate control planes for different transports/devices instead of needing to assign some sensible semantics to the TTY codes.

anonymousiam · on Nov 29, 2020

Baudot still lives.

https://www.sigidwiki.com/wiki/Radio_Teletype_(RTTY)

https://en.wikipedia.org/wiki/Baudot_code

Woodi · on Nov 29, 2020

What a strange sentence:

"When IBM released its game-changing System/360 in 1964, the head of the development team, Frederick Brooks, decided that its punch cards and printers were not yet capable of using ASCII."

Brooks decided or just stated a state of reality that currently available peripherial hardware is uncapable of using newly proposed standard ?

Also how - as usual - convenient, that IBM keeped machines uncompatibe while proposing others to make their systems accesible :>

Also 'ASCII: Origins' is more fashionable - more like what movies do :)

BareNakedCoder · on Nov 29, 2020

FYI, the IBM S/360 did support ASCII[0]. A bit in the program status word indicated if the machine was running in ASCII or EBCDIC mode. Most customers ran EBCDIC because it was more compatible with their masses of punch card originated data (no translation required). For the S/370, for virtual memory, a bit in the program status word was needed to indicate addresses were real or virtual. IBM surveyed its customers and found almost no one was using ASCII so its bit was taken for virtual memory.

[0] Page 70 of http://bitsavers.org/pdf/ibm/360/princOps/A22-6821-0_360Prin...

jecel · on Nov 29, 2020

There were changes in 1965 and the 1967 version is considered the official one. Companies who adopted ASCII before that, like Xerox and DEC, had a left arrow instead of "_" and up arrow instead of "^" in their equipment.

When Smalltalk-78's internal character set was replaced by ASCII in Smalltalk-80, it was the 1963 ASCII that they adopted since that is what they knew at Xerox. Though the goal was to be friendly to the rest of the world, the fact that almost everyone else saw their assignments as underscore causes problems to this day.

So it is possible that 1964 was too early for IBM to embrace ASCII. But to be fair, the 1963 x 1967 ASCII problems are trivial compared to any ASCII x EBCDIC.

DonnieP · on Nov 29, 2020

Who knew ASCII was Turing-complete? "In 1981, however, ASCII became the new standard when it released its first personal computer featuring the operating system."