Naming and
arranging files

a short guide for
better data management

Why naming and arranging files:

Data and metadata is central to research. It is the basis for all the conclusions that we draw from our experiments. As such, keeping data organized and accessible for you, your fellow researchers and the tools that are going to be processing your data is essential.

Most of the time, data is first stored as a file on a computer. That is why, in order to have a good data hygiene, from day one, you should think about how you are going to name and arrange your files.

Think about the the file naming strategy before you start

Define a set of tokens to define concepts

Tokens are short and concise strings that can be easily associated with a concept without being a full description. Tokens must:

  • be short and human readable so that they can be easily identifiable by whoever is conducting the experiments.
  • be unequivocal so that they can be easily associated with a single concept.
  • have an explicit alternative for the absence of the condition.
  • avoid special characters

Short and human readable

Tokens must be something that can be easily remembered and typed. They have to be short and concise so that file names do not become too long and difficult to read

- HeLa for the cell line
- male for the gender of your mouse
- AUX for the Auxin treatment

Unequivocal

Tokens have to be associated with a single concept and only that concept, so try to avoid ambiguity.

Explicit

Think that the absence of a condition is also a condition. If you use the AUX token to indicate that a sample has been treated with auxin, you should also use a token to indicate that a sample has not been treated with auxin. For example, you could use noAUX.

Avoid spaces and other special characters

You should avoid the space (“ ”), the dot (“.”) and other special characters (&, %, *, |, @, :, ?, …). They have a particular meaning for computers.

While these characters can be useful for you, they will bite you one day.

😖

You may find a complete list of characters that should be avoided in file names in this Wikipedia article.

Tokens containing keyword:value pairs

Sometimes, you want to be specific about a condition. For example, you may want to indicate the concentration of a drug that you have used, or the time of the day when you have collected a sample. In those cases, you can use a key and a value to define a token.

- AUX-10uM for a sample treated with 10 µM (“u” for “µ”) of Auxin
- hpt-24h for a sample collected 24 hours post transfection

  • Always use the same key for the same condition.
  • Use a dash (“-“) to separate the key from the value.
  • Always indicate units when using a numeric value.

Arranging tokens in a file name

Tokens must be arranged in the path to your file and the file name in a logical way.

Separating tokens

Tokens must be separated by a special character. We strongly recommend the underscore (“_“).

Needless to say, never use that separator in the tokens themselves.

A real block quote example
with multiple lines

From general to specific

It is very important to sort tokens in a consistent way and from general to specific. This will make it much easier to “read” the file name and sort your data.

Use the same logic for arranging data in folders

You are not going to store all the files in a single directory, are you? You will probably want to arrange them in folders. And you can use the same logic for arranging tokens in the file name to arrange folders.