Syntax, Semantics, and Naked Emperors: Lessons Students Can Learn From an Academic Hoax

16/05/2014 10:29 BST | Updated 15/07/2014 10:59 BST

Research students at the esteemed Massachusetts Institute of Technology have devised a computer programme which automatically writes academic papers. The freely available programme, called SCIgen, has met with some success given that some of the papers it has produced have been accepted by academic journals and conferences despite them being devoid of meaning. Academic hoaxes are nothing new and include, perhaps most famously, the Sokal hoax in which a physician published a gibberish paper having successfully passed it off as postmodernism. So, what can students learn from this latest hoax in a long line of academic hoaxes with respect to being successful in course assignments?

The most important lesson that students can learn from the SCIgen affair is the way that the developers have identified the structure of a good research paper. SCIgen developers have managed to produce papers which are syntactically appropriate even though they are devoid of any semantics. The SCIgen outputs are reminiscent of 'Colorless green ideas sleep furiously', a famous statement offered by another member of MIT, linguist Noam Chomsky, in 1957 as an example of a sentence which has correct syntax but no meaning. Although SCIgen currently only produces computer science papers (with the fields of risk management and information security being the next priority), the point about good written form, even if the content of a piece of writing is severely problematic, is nevertheless vital for all students to realise.

An examination of the SCIgen computer code reveals lists of names, initials, known authors, nouns, verbs, adjectives, buzzwords, and so forth. There are also instructions for automatically producing diagrams and graphs not to mention the all-important rules about how all the randomly selected individual elements should be combined and formatted. So, for example, to produce a title SCIgen selects from a list of different forms of title, with each list including a number of 'blanks'. SCIgen then selects from a list of words to randomly fill in these blanks in cases where they are not specified by the user. One form for a title in the SCIgen code is SCI_TITLE_PREFIX a methodology for the SCI_ACT. The first blank of this generic title form, SCI_TITLE_PREFIX, can be specified by the user so anything that sounds good can be chosen (otherwise SCIgen fills in the blank itself). The second blank, SCI_ACT, is again made up of other blanks which include a reference to a list of words such as robots, spreadsheets, and Moore's Law. The upshot is that SCIgen can create a title that doesn't mean anything at all yet has the necessary form of a title. One example of a research paper title that could be generated by SCIgen would be 'Imaging System: A Methodology for the Understanding of Robots that would Allow for Further Study into Spreadsheets'. Other titles that have been generated, using different title forms, include 'Cooperative, Compact Algorithms for Randomized Algorithms' and 'The Influence of Probabilistic Methodologies on Networking'. Procedures such as this are used for producing an abstract, citations, the text of the article, and a bibliography.


From my own experience, I believe too little emphasis is given to good written form in British university courses. Students are usually expected to acquire good writing skills in the process of their studies and from the feedback they receive on written work. But there is almost no formal training given in effective written communication for academic purposes which is why I often recommend students to take the initiative of referring to books like The Study Skills Handbook by Stella Cottrell or The Craft of Research by Booth, Colomb, and Williams.

Despite differences in student assignments and writing conventions across academic disciplines, all students can develop their own procedures for producing written work with good form, just as the developers of SCIgen have. These procedures need to be developed from an analysis of the writing of a favourite academic author or the works on a course reading list. Journal articles are a good model for student written work because they are each focussed on a specific matter and written in an academic style. Also, due to being relatively short, the structure of a journal article is easier to assess.

The following exercises should help most students to begin devising their own procedures for producing written work that has good form and which, therefore, meets with some success.

Exercise 1. Look at the titles of your chosen collection of exemplary academic written work in your subject. You should find that most of the titles of these articles are wordy and specific rather than short and vague. For example, you will be more likely to find titles of the form 'The Epistemological Concept of "Religious Ambiguity" in Hick's Religious Pluralism' than of the form 'Hick's Religious Pluralism'. Another example would be 'An Empirical Study into the Effects of Caffeine Consumption on Anxiety Levels in UK Adolescents' rather than 'Coffee and Well-Being in Teenagers'.

Exercise 2. An academic article should make a claim which is supported by evidence. Usually the claims being made are very modest and highly qualified. Examine types of claim that are made in academic writing, how they are stated, where they are stated, and how often they are stated. How are the claims related to the evidence? What type of words and phrases are used to relate the claims to the evidence, for example, is evidence said to prove or suggest, does it refute or undermine? Develop you own stock of words and phrases for making claims and supporting them.

Exercise 3. Paragraphs can serve different functions. Some give background information and some give the order of discussion in a piece of academic writing. Yet other paragraphs can be thought of as the building blocks of the overall argument, themselves being mini-arguments: the first sentence introducing a claim, the following sentences supporting the claim, and the final sentence stating the claim. See what different types of paragraphs are used in exemplary written work and identify the internal structure of each.

Exercise 4. It will often be said that a good piece of writing needs to have good 'flow'. This means that the sentences within the paragraphs, as well as the paragraphs themselves, need to build upon each other in order to gradually persuade the reader of a particular point of view. The consequence of sentences and paragraphs not flowing well is that a piece of writing ends up being difficult to understand. In the context of assessment, an assignment that is difficult to understand will give a bad impression to the person doing the grading. There are words and phrases which can aid flow within a paragraph such as so, then, moreover, furthermore, in addition, consequently, it follows that, and so forth. There are also words which kill flow and these are words that introduce ambiguity. It is not a good idea, for example, to refer to a person or concept in a previous sentence without it being clear which person or concept is being referred to, or even which sentence! So, be careful, when using it, he, she, them, these, they, this, that, and so forth, especially as the first word of a sentence. A little bit of repetition may help dispel ambiguity.

Exercise 5. Count the number of words used in sentences and then the number of sentences used in paragraphs. Although there will be some variation, generally a sentence will have less than 40 words (not including in-text citations) and a paragraph will have around 5 sentences.

Exercise 6. Scrutinise the way that different types of punctuation are used. Think about some of the following questions. How many commas tend to be used in a sentence? When are dashes and brackets used, if at all? How common is it to use colons and semicolons? Does punctuation aid clarity (and, therefore, reader understanding) or not? Once you have analysed use of punctuation choose some sentences with a lot of different types of punctuation. See if writing them with less punctuation (for example, as separate sentences) is useful.

Exercise 7. To really give your written work an air of perfection try to style it as academic authors style their written work. Italicise a word in order to emphasise it (do not use bold or underline). Italics are also used for book titles and foreign words like Schadenfreude. For quotations use single quotation marks and for quotations within quotations use double quotation marks (the US convention is the opposite way round). Place the titles of book chapters and journal articles in single quotation marks (double quotation marks in the US). Long quotations (say, over forty words) need to be set in from both the left and right margins and separate from the text which comes before and after it (if you analyse enough written work you will see what I mean). Format your bibliography in the same way as academic authors do with all the same information that they supply, especially author name(s), date of publication, publisher name, and place of publication. For more guidance on these topics refer to any notes given by your academic institution or ask your lecturer to refer you to a 'style guide' which is commonly used in your discipline. Two comprehensive style guides are the New Oxford Style Manual and the Chicago Manual of Style, but they will be more than what is needed for most students. It can be tempting to devise your own style conventions but they will not cut the mustard in the same way.

These exercises are all designed to develop procedures which emulate (to use a computer science term) good written work. Many more exercises could have been given but hopefully the point is clear: there are definite steps which can be taken to improve the quality of your written work even if you a struggling to understand your course material. Just as it is said success is 90% perspiration and 10% inspiration, perhaps we can also say that a well-written assignment is 90% form and only 10% content.

There is one more thing which has to be said about SCIgen. If academics are asked to review or grade work which has perfect form, despite it lacking meaning, then there may be a 'naked emperor' effect. In Hans Christian Andersen's story, 'The Emperor's New Clothes', the gathered people were afraid to point out, due to personal insecurities, that their vain emperor was in fact naked. Similarly, academics may find it difficult to challenge a meaningless paper which appears as something well written. After all, unless an academic has reason to expect a hoax they may fear that their inability to comprehend a written work rests in their lack of knowledge rather than in the work comprising gobbledegook.