Text

Candle unlit

A girl and a boy were walking through the graveyard. She was a skinny, long-haired sable, dressed as a witch. A lace-trimmed shawl covered her arms. He was tall, his hair — a nest-like entangled mess. His face was painted in white and black mimicking a skull.

They were talking, laughing and holding each other — enclosed in a private little world of their own.

It was late in the evening, All Saints’ Day. The Sun was long gone. The Moon lurked through a thick coat of clouds.

But the cemetery was hardly dark.

Countless little lights glimmered in the night. Candles, lit by the living to tell the dead they haven’t been forgotten. To show those on the other side the path back to their loved ones.

A fat old lady was sitting on a bench and praying. In front of her were lying two gravestones. One saying:

Jan Bramski
1947 - 2011
Beloved husband

The other one — un-engraved.

The girl laughed at something the boy must’ve said, her teeth like little pearls, her laugh loud and cheerful. The old lady turned her head. The sanctum of her thoughts had been desecrated.

"To likho with’em, youngsters!” she spit.

The wind blew off the light on her husband’s grave. She got up and tried to light it again, but each time a burning match got close to the candle — it went out. Finally, the she got up and resigned left the churchyard.

The couple noticed nothing. They kept walking, happy and self-sufficient.

The girl broke away chuckling, leaving the shawl clutched in the boy’s hand. His phone rang and he picked up.

The girl though, didn’t notice. She kept running. When she stopped, she could not tell what part of the cemetery she was in neither how she got there. Her companion was nowhere to be seen.

This part of the graveyard wasn’t lit as well as the rest. The graves were old, covered by wild bushes and moss. Nobody visited the dead lying there.

The girl shivered. Her shawl was gone and the night was getting chilly.

On a bench built of half-rotten planks, laid a suitcase. She approached it. Who could’ve left it? She noticed the case wasn’t locked so she lifted the lid.

The boy couldn’t find her anywhere. “Kaśka, where are you?!” he kept shouting, disturbing the stillness of the night.

An old man was going home, to his wife. He didn’t quite remember why he had left, but he surely missed her. Alongside he carried a case. Some black hair stuck out from under the closed lid.


Creative Commons Licence
Candle unlit by Karol Marcjan is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Text

On the art of writing comments (in source code)

First of all — from now on the blog is in English. My audience (assuming I have any) shouldn’t have much problems with that. Content in Polish might still appear here from time to time. It’s just that English will be the default language from now on.

Now — to business.

I’ve been fighting my procrastination lately. That includes some programming. But I’ve noticed that for some reason I’ve been writing my comments different this time.

Code as prose

What I intend to state here is — you can write code that reads like prose. Or at least code that resembles prose more than your average program.

After all, programs are not only read by computers — they are also read by programmers. And of course, part of any programmer’s education is to be able to understand the language the machine speaks.

But programmers are people. And, apart from those few of us whose bedtime stories were told in x86 assembler, our first languages learnt were, the so-called, natural languages. Evolution has provided our brains with powerful natural language processing capabilities. We should put them to good use.

And comments like

// Incrementing counter

aren’t exactly the best we can do.

But I'm not a humanities person! some of you might say. Well, despite the title of this essay, I do not expect you to turn your comments into art. I expect just some solid craft (that’s what most of traditional art consists of anyway).

A program consists of paragraphs

At least a prose-like program does.

But why would you want to write a program as paragraphs? You already have all those ways to structure your code: functions, branching, loops… But a human being’s stream of consciousness has no nested scopes.

Don’t get me wrong — nested scopes are a wonderful way to manage complexity. That’s why they’ve mostly superseded goto and labels — to the level where many modern languages do not support them at all.

And when you sit down to read a program built of paragraphs it’ll still contain functions, branches and loops. Those two layers will exist orthogonally to each other. Orthogonally in the sense that they won’t interfere with each other in ways that introduce complexity, not in the sense that there will be no logical connection between them — think orthogonal like a good design in engineering.

But what exactly am I talking about? Let’s see an example.

Imagine you are writing a lexer (I, as a matter of fact, currently am). What’s the job of a lexer? If you don’t know yet, go and see. I’ll wait.

You back? Right, now lets go on.

It sometimes happens in lexers that you encounter whitespace and need to skip it. Go’s fmt package has an interface for objects which can be scanned for (with C-like functions like fmt.Scan or fmt.Fscan).

These Scanners have a Scan method which accepts an fmt.ScanState as an argument. You can read characters from one of those. You can also “push” the last character back, so it will be read again next time. You can push the same character back multiple times, but you cannot go more than one character back. It also has a method to skip any whitespace encountered.

As a part of my lexer I’ve got a TokenScanner structure which (obviously) implements fmt.Scanner. It doesn’t call it’s methods directly though — because for each token I want to know where was the last character of the token in the input. The default methods don’t do that for me, so I implement thin wrappers over them, to provide the proper logic.

Now, to the code:

// Skip meaningless whitespace, handling proper line and column-position
// counting.
func (ts *TokenScanner) SkipSpace() {

    var char rune

    // Until a non-whitespace character is found or an error is
    // encountered
    nonSpaceFound := false
    errorOccured := false

    for !nonSpaceFound && !errorOccured {

        // Get next character
        char = ts.ReadChar()

        // Report error if any
        if ts.Error != nil {
            errorOccured = true

        // Since it's not a whitespace character we "push" the character
        // back into the ScanState and hop out of the loop.
        } else if !unicode.IsSpace(char) {

            ts.UnreadChar()
            nonSpaceFound = true
        }
    }

}

You might not know the exact syntax, but the code itself is pretty straightforward. But not all of it. In my project the lexer and the parser run concurrently (for the fun of it and also because it’s so easy to write concurrent code in Go).

For reasons you need not know that means the token itself must contain the error information. Hence, the way I pass errors around is not exactly the idiomatic way to do so.

Everyone would agree that deserves a comment. It might even seem short. What matters is that the comment and the code form a naturally understandable unit — even if the code isn’t complete on it’s own.

You see the comment before the loop? The first one in the method’s body? Setting the flags to their default values and the looping condition form a unit of understanding — a paragraph.

Code follows comment

Apparently in Anglo-Saxon education they teach kids to write paragraphs that start with so-called topic sentences. First you state something. The rest of the paragraph you spend expanding on it. That’s a valid paragraph structure, but not the only one and by no means the only one you should use for writing in general. Works fine for most comments though.

But some people might get it all wrong. They might think the code is their topic sentence. That’s a bad idea. Imagine how would the code above look like if the comments followed the code. First you’d be seeing some code — pretty clear code in this case, but it often won’t be — and only then trying to understand what it’s supposed to do.

Can you imagine a math or science paper that first presents a formula and then explains how it got to it? Of course not! Everybody would clearly see such a paper is ill-organised.

Math and science papers are similar to source code in that both are written using two languages. One of which is not a natural human language, and one of whic is an artificially created one. And before you throw some of that artificial stuff at your reader it might be a good idea to give him something to ease the introduction.

How long should a comment be?

That example you just saw might not seem like prose to you. The comments aren’t particularly long, nor are they very explanatory.

That is because how you comment depends on what you comment. As I’ve said before — that code was pretty straightforward. It just needed a few words here and there. The example is mostly code and the comments are an addendum.

Then in the same project I had a line which looked exactly like this:

i += j - 1

It was preceded by five lines of comments.

Now you might be scratching your head: Hey! Before you did something much more complex and hardly put comments on it. Now you expect us to put five lines of comment on some simple arithmetic?

You’d be right the arithmetic is simple. But that comment wasn’t about how to perform addition and subtraction. Nor was it on how variable += expression is equivalent to variable = variable + expression.

It was about the reason behind the arithmetic.

The code before was quite verbose and self-explanatory. The comments there were mostly orientation points not to let your eyes get lost in it (it’d be more obvious with syntax highlighting).

This single line is, in contrast, cryptic. It is obvious what it does of course. It might not be obvious why it does that and how that affects the rest of the code.

It is also possible to write a comment that’s too long. If you see no code on your screen — just a comment — and it’s not a big block of license information or documentation — something is probably wrong.

A comment should be as long as it takes to give the reader all the information necessary to understand the code without distracting him too much.

Hence, the five lines.

Learning to write comments means learning to write

There is no avoiding it. If you cannot write a properly constructed sentence your comments will suck.

Don’t get me wrong — you don’t have to be the always completely correct. That’s downright impossible. But you need to express yourself clearly, in a reliably understandable manner. Stress on the ‘reliably’.

That means that if somebody who doesn’t know you well but is qualified to understand the domain in which you are working in thinks you speak gibberish — you almost certainly do.

This is by no means related to being a humanities person or not. Intelligent and competent people should be able to express themselves clearly. Math and programming get harder when you don’t think clearly. So does expressing yourself.

Bad grammar is likely to be a sing of ill-formed thought.

KISS, KISS, BANG! BANG!

There is an old wisdom amongst programmers. Some say it comes from the Unix camp. Even if it doesn’t — it clearly expresses something about the spirit of Unix.

The old wisdom says “Keep It Simple, Stupid!”.

It’s message is basically: do not introduce unnecessary complexity into the problem. The problem you work on is complicated on it’s own. You are supposed to manage the complexity, not accumulate it.

That’s true not only of your code but also of your comments. It’d be considered a bad practice have a highly varying degree of complexity in most of prose — sf and some experimental pieces might be exceptions.

If comments’ complexity skyrockets without a warning — it’s another bad sign. And while writing

// Here be dragons!

might be funny, it’s hardly the solution (given you have the time to actually solve the issue).

An overly complex comment is most likely to mean that you are trying to comment to much at once — review the section about comments length again first. But there’s more to it.

While the problem might lie in trying to comment too many lines at once, it may also be different in nature — a single line itself might be doing too much.

As always — premature optimisation is the root of most evil.

Some of what remains might come from trying to pretend to be smart — don’t. Smart people try to express themselves clearly.

Sometimes you might think I'll use these fifteen arcane little features -- after all I've mastered the language and will always be able to understand this single line clearly upon seeing it..

Unless what you’re doing is a well-recognised idiom, you’re probably wrong. What you write at your best you won’t understand at all at your worst and will have problems understanding at your average. Not to mention the problems others might have with your code.

Final notes

I’m not gonna talk about ad hoc and throwaway programs. I planned to but this post has grown long — far longer than I expected it to be.

Of course these guidelines do not always apply. Nobody expects you to put any comments in when you’re in a code golf or obfuscation contest. And in the second case — even if you did, you’d probably do it for reasons far divergent from the rationale of this article.

Also — these are guidelines, not rules. Much about programming is more of an art than science. But I think guidelines with a point or two behind them are still worth more than nothing.

Note that this article is mainly concerned with commenting programs written in variants of the imperative paradigm — like the structural, or object-oriented programming. Some of this might still apply to functional and declarative programming — but possibly in different ways.

Thanks to Cezary Dudziak, Tomasz Kucza and Mariusz Kondratowicz for reviewing drafts of this.

Photo
cjwho:


  Sitcom Apartments Floor Plans