Back to home page

Some common mistakes to avoid in scientific writing

As a referee for papers submitted to journals and conferences in the area of computational linguistics, there are a few mistakes I encounter frequently. Since conferences do not usually have copy-editors to make sure these mistakes are eliminated before the papers are printed as part of proceedings, the only opportunity to eliminate them is that the mistakes are pointed out to the authors in the referee report. Many of the mistakes are however of a trivial nature and a waste of much of the precious time of the referees can be avoided if the authors make sure the mistakes do not occur in the first place. It is also very much in the interest of authors to avoid these mistakes, since finding too many of them may give the referee a bad impression of the authors and their work and may eventually lead to less favourable recommendations for acceptance/rejection.

The following is intended as a first attempt at making a list of such trivial, easily avoidable mistakes. If you have anything to add to this list, please let me know (markjan@let.rug.nl).

Latex

Language

If one compares the respective styles of writing in say computer science and computational linguistics, one finds the latter seems to be bound to less strict rules. This is not because one needs more `sophisticated' language in order to describe the intricacies of how to process linguistic material by computational means, but it is simply because the need for clear and concise scientific writing has not sunk in yet with a considerable part of our community.

If you feel computational linguistics is not a science such as mathematics, biology or chemistry, you should write whatever way you like. Otherwise, I offer the following simple guidelines.

Descriptions of Algorithms

The computational linguistics literature has an abysmal tradition when it comes to precise descriptions of algorithms. Even today, not enough computational linguists are aware of the possibility of using pseudo code, which has been very common in the computer science literature for decades. Such code contains e.g. It is not at all a good idea to use particular programming languages, such as Lisp or Prolog, to describe an algorithm in a scientific paper, for the following reasons:

Structure

A paper should be divided into sections and paragraphs in a sensible way. Specifically:

Claims

A claim made in a paper should be supported by verifiable arguments or data that give sufficient credibility to the claim. Science is about dialogue and exchanging arguments, not about spreading propaganda. Exactly how much support should accompany a claim is difficult to say, but it is generally true that the less intuitive and the more surprising a claim is, the more support it needs.

E.g., if a paper discusses a new grammar formalism accompanied by only a few examples of languages phenomena described in that formalism, it would be outrageous if the authors included the statement ''our new formalism now makes all other grammar formalisms redundant''. This pretentious statement would make the paper unsuitable for publication, although the new formalism could actually be a genuine contribution to the field.

Concerning claims of the novelty of material, it is the job of the authors to include references to relevant publications, and it is the job of the referees to check for any missing references. However, literal claims made in papers that the authors are the first ever to come up with some idea or to make an implementation of something are often met with scorn, since the breadth and depth of the existing literature on computational linguistics makes it very hard to verify such claims, although actually refuting such a claim is often not even that difficult. It should further be noted that (misplaced) claims of novelty do not benefit the chances of acceptance of a paper, as some authors may naively expect.

It seems valid however to write ''In the existing method A, processing works this way, whereas in our new method B, processing works that way'', but the use of the word ''new'' here should be seen as a figure of speech with the intention of creating a contrast between the described existing literature and the material in the present paper, not as an absolute guarantee that no one has ever before published or implemented method B.

Concluding Remark

Avoiding the common pitfalls above is of course not a guarantee for obtaining a well-written article. The main difficulty is how to describe scientific work such that someone other than the authors understands what was meant. Inexperienced authors may get benefit from academic writing courses.

Bibliography

I have not tried to find more appropriate textbooks on academic writing, but I'm sure they exist.