top of page

What is a gene? How does a gene make a protein?

To answer this question, let's first look at some definitions.

What is the cell?

The cell is the smallest functioning unit in the body, capable of self-replication and repair and assembling with other cells to form tissues and organs. Remember that the cells are tiny, only 0.010 - 0.1 mm (10 - 100 micrometers) in diameter, and therefore the substances that they make and move inside are also tiny. Therefore, I would like to point out that we are not talking about substances but about particles or, better, molecules. So, molecules enter the cell, and the cell produces what it needs for its maintenance or for the function of the organ in which it participates. In its interior, the aqueous cell plasma, various factories are housed that are referred to as organelles. For example, it produces energy in the mitochondria - this is a tiny molecule (ATP) that is used in manufacturing and conversion processes. At the ribosomes, the cell synthesizes proteins. The Golgi apparatus is a membrane system for transport processes; it moves proteins and fats in and out of the cell and also back and forth within the cell.

A nucleus houses the genes in the form of DNA. It has pores so specific molecules can travel out and vice versa. Individual genes are transcribed, and these transcripts, referred to as messenger (m) RNA, enter the cell plasma and serve as a template for protein synthesis.

What is a gene?

A gene is a section of DNA (deoxyribonucleic acid). The section that makes up a gene can be considered a unit being translated to give rise to a specific task. All genes are transcribed into RNA by an enzyme called RNA Polymerase, which attaches to the strand and synthesizes a copy. To facilitate this process, the chromatin separates from the DNA, the double strand opens up, and a single strand becomes accessible.

Only the protein-coding genes give rise to mRNA serving as a template for protein synthesis. There are RNA genes that give rise to functional RNA molecules, which regulate the activity of genes. These functional RNA molecules are also used therapeutically. I name two: (1) microRNAs (micro = very small) that attach to mRNAs, thereby blocking protein synthesis, and (2) lncRNAs (long non-coding) that attach to histones and facilitate transcription.


Now let's look at DNA.

1. How is DNA found in the cell?

The DNA is a long thread within the cell nucleus. It is wrapped around histones, which collectively constitute the chromatin, and everything is coiled into chromosomes. A human cell has 46 such threads, i.e., 46 chromosomes. 23 are inherited from the father (sperm) and 23 from the mother (ovum) - this means that all human genes are distributed over 23 chromosomes and also that the genes in body cells are duplicated.

Info: 'double set of chromosomes = all genes twice'

I'll explain this briefly, taking eye color as an illustrative example. The „blue eye“ gene is inherited from the father, and the „green eye“ gene from the mother. In the cells of the iris, the gene for the green pigment is suppressed; instead, the father’s gene is expressed, and the blue pigment is produced. From this, it can be deduced that the genes of the father and mother are the same, meaning they provide the building instructions for a protein in the same process („giving the iris color“). Still, the information for building the protein molecule can be slightly different.

Illustration: DNA double-strand coiled into chromosomes

2. What exactly is DNA?

Each of the 23 strands of DNA is of unique lengths and contains unique genes, which are defined by a code based on the sequence of four nucleotide bases. DNA consists of a backbone of strung-together deoxyribonucleic acid nucleotides. What is unique about these nucleotides is that they carry one of four possible bases, either the nucleotide base adenine (abbreviated: A) or guanine (abbreviated: G), cytosine (abbreviated: C) and thymine (abbreviated: T). This results in a specific base sequence, which defines a code. A different arrangement of these four nucleotides results in a different sequence of bases and, thus, a different code. To describe the code, only the bases are named: A – G – T – C – T – T – A – C – C – C – G

Within the cell, DNA is always present as a double strand. The principle of complementary base pairing (adenine always pairs with thymine, and guanine always pairs with cytosine) ensures that the code is retained during amplification.

3. How does the length of the mRNA relate to the size of the protein?

The mRNA of protein-encoding genes is transported from the cell nucleus to the ribosomes. Amino acids, which are the building blocks of proteins, are also brought here. From a defined starting point of the mRNA, three nucleotide bases are paired with one of a total of 21 amino acids (AA). The amino acids that are next to each other are intertwined to form a chain called a peptide when it is very short (less than 100 AA) and a polypeptide when it's longer. The chain then unfolds into the protein.

A polypeptide chain has only a third as many amino acids as the mRNA has nucleotide bases. If the mRNA is short, the polypeptide chain is also short, and the protein is small; if the mRNA is long, the protein is large.

Illustration: Protein made from messenger RNA

Wrap up:

DNA consists of a nucleotide backbone with a specific sequence of bases that defines a code. In many sections of the DNA, the code instructs the building of specific proteins; these are protein-encoding genes. The RNA transcribed from these genes is processed into messenger (m)RNA. This can be transported out of the cell nucleus and is the template for synthesizing a chain of amino acids (polypeptide), which then unfolds into a protein.

There are also RNA genes that code for functional microRNAs and long non-coding RNAs. These genes are transcribed as well into RNAs, which are then modified to produce functional molecules. Remaining in the cell nucleus, these regulate the activity of genes.

The term Genome defines the total DNA sequence contained in 23 chromosomes. It includes over 20.000 protein-encoding genes and roughly 21.000 genes that code for functional RNAs. In addition, there are non-coding sequences that encode motifs for binding transcription factors that control gene regulation.


What is gene function in contrast to the gene?

A gene designates a unit within the DNA - a code is defined, which specifies either a messenger RNA serving as a protein synthesis template or a functional RNA that regulates the activity of specific genes.

Gene function refers to the consequences of transcribing the gene into RNA. - I take a protein-coding gene as an example:

What the protein does in the cell and the body, in which process it is involved, is meant by ‚function‘.

Therefore, in non-scientific texts, gene function is equated with protein function. In the following paragraph, I give a few hints for dealing with the designations in scientific texts.

How to distinguish gene versus protein in spelling?

Whether it is the gene or the protein that is being described can be seen from the spelling of the name. Designations of genes and proteins only use acronyms for the full name.

  • When several related genes form a family, then these are numbered.

  • The names of human proteins are always written in capital letters.

  • Animal proteins contain only the first letter as a capital letter.

  • Genes are always written in italics.

In practice, it looks like this:

Example 1

Full Gene Name: Breast Cancer 1

Acronym: BRCA1

The gene encodes an enzyme that repairs DNA and is located on chromosome 17.

The protein is spelled BRCA1. In the mouse, it is written Brca1. Mouse designations find their application in that non-clinical studies on the function of proteins and genes are performed in mice.

There is a related gene, BRCA2, which also codes for a DNA repair enzyme and is located on chromosome 13. The protein is written BRCA2, and in the mouse, it is Brca2.

Example 2

Iroquois homeobox genes: IRX is the acronym. There are six related genes in humans: IRX1, IRX2, IRX3, IRX4, IRX5, and IRX6. These genes code for transcription factors, i.e., DNA-binding proteins that regulate genes, switch these on and off.

The proteins are written: IRX1, IRX2, IRX3, IRX4, IRX5, and IRX6. In the mouse, the homologous genes are Irx1, Irx2, Irx3, Irx4, Irx5, and Irx6.

That's how you can think of all the genes encoded in the genome.


I will dedicate the following blog articles to therapies. We will look at genetic defects and how these express, and I will explain how the function of a gene can be restored.

Therapeutic products in the context of gene therapy are classified as 'Advanced Therapy Medicinal Products (ATMP)' by the European Agency for Drug Development and Regulatory Affairs (EMA).


bottom of page