Tech news
at TheJemReport.com
Software reviews
at SoftwareinReview.com
Hardware reviews
at HardwareinReview.com
Discuss technology
at TJRForum.com

December 5, 2004

Compilers, Binaries, Partitions and Other Common GNU/Linux Terms

Filed under: Articles — @ 11:27 pm

If you’re thinking of switching to GNU/Linux, BSD Unix, or proprietary UNIX from a Windows environment, the first obstacle you’re going to encounter is language. The terminology is a bit different from what you’re used to in Windows because the two operating systems… well, operate differently. Most people will probably be interested in switching from Windows to GNU/Linux rather than other proprietary or free Unix systems, so this article will deal mostly with GNU/Linux, although these terms will generally apply to other Unixes as well.


The Big Differences

First there are the differing naming conventions for programs. Most programs in Windows are brand names, such as Adobe Photoshop. It’s pretty much self-explanatory what programs like this can do. In GNU/Linux many programs are named with acronyms, such as The GIMP (The GNU Image Manipulation Program), and especially GNU, which stands for GNU’s Not Unix (this is a recursive acronym). Despite its name, GNU is a kind of Unix in terms of how it operates and the kinds of programs it can run. It is not a trademarked UNIX and it doesn’t have a working kernel — a kernel being the brain of the operating system, which handles communication directly with the hardware. Linux (which, as a name, is a conglomeration of Linus and Unix) is a complete kernel, however, so when it is combined with the GNU operating system we have GNU/Linux. Many people refer to this concatenation simply as “Linux,” which is technically incorrect but tends to get the point across.

In Windows (and DOS before it) uppercase and lowercase letters are totally interchangeable in the command line and in the naming of files. In Unix, upper and lowercase letters make a huge difference. COMMAND.COM and Command.com would be two completely different files in Unix, whereas in Windows they would be the same file (or at least the same file name as far as Windows is concerned).

In the Windows world, most programs (including the operating system itself) are distributed and installed as pre-assembled packages. All you do is unpack or copy certain files from the installation source (usually a CD or a downloaded installation file) to special places on the hard drive and you’re ready to use the program. This is done via an installation utility or setup program which does the copying and configuring for you.

In the Linux world, programs can be installed from human-readable (and editable) source code or from pre-assembled packages. When you install from the source code, you use a program called a compiler to translate that code into something the machine can easily understand and follow — a process known as compiling.

Aside from that, GNU/Linux and Windows differ on their most basic levels, so programs written specifically for Windows will not run on GNU/Linux without the assistance of a third-party helper. Such helpers can be emulators, which emulate a Windows environment; or virtual machines, which are programs that run Windows (or other operating systems) inside of GNU/Linux.

Compilers, Source Code, Computer Languages

When a programmer creates a program, he or she begins by typing human-readable program code into a text editor. The word code in this context refers to words in a computer language. Examples of computer languages include C, C++, Java (not to be confused with JavaScript), BASIC, Pascal, Assembly, FORTRAN, and Lisp. There are also scripting languages which are less commonly used to create entire monolithic programs, but instead are generally used to create smaller scripts that perform a list of tasks. Such languages include Perl, Python and Ruby. Lastly there are languages made specifically for use with a web browser: PHP, ASP and HTML are a few examples.

The original Unix operating system was the first OS to be written in the C language. Today that language is still in use and is still used in many kinds of Unix systems. More commonly the derivative known as C++ is used for GNU/Linux programs. Here is an example of what a small C++ program looks like:

#include <iostream.h>

main() {

cout “This is a C++ program”;

return 0;

}

In this form it’s pretty easy to understand what this program does: it outputs the phrase, “This is a C++ program” (without quotes, of course) to the screen. Unfortunately the computer doesn’t understand these statements as they are, so we have to use a compiler to translate the source code into binary code, which is basically a bunch of ones and zeroes — this is all the computer can really understand at its most basic level. The compiler will translate the source code for the specific machine type that you tell it to; this means that a binary for the PowerPC processor (used in Macintosh computers) will not work on a system based on an Intel or AMD processor. You can also cross-compile, meaning you can build binaries for processor architectures other than the one you’re currently using. For Intel or AMD systems, there can be a variety of different classes that compilers can use: IA32, i386, i586, i686, p3, p4, athlon, athlonxp, x86_64, and hammer. They indicate varying levels of complexity in CPU design and instruction sets; the latter two are specific to the AMD64 architecture, but the others are more or less interchangeable on most modern systems. You should try to match up your processor type to the correct compiler flag — if in doubt, choose IA32 or i386, as they will work with any 32-bit Intel or AMD processor.

The most commonly used compiler in the Unix world today is the GNU Compiler Collection, also known as GCC. The GNU Project has a website for GCC here if you’d like to find out more about it. Other companies also have their own special compilers — such as Intel — and there are commercial proprietary compilers available from such vendors as Borland and Microsoft.

So when we say that a program is distributed as a “binary,” that means it has already been compiled and it is ready to run. If we download the program as “source,” then we have to compile it in order to get the program to run.

Directories, Disks, Filesystems, and Partitions

Before Windows there was DOS, and DOS was text-based. In order to separate files and order them properly, they needed to be placed in directories, which are small areas of the drive that are designated to organize files that are related to one another for a certain purpose. When Windows 95 came out, Microsoft came up with a more neophyte-friendly name for directories (or more accurately, stole the term from the Mac OS): folders. The two terms are interchangeable, but the term directory is older and may be viewed by some to be “more correct.”

Usually a program will create its own directory or directories to hold its files. You can also create directories and do whatever you want to with them, although it’s a good idea not to clutter your system with useless directory entries. Directories, properly used, make it easy to find important files.

If you’re used to Windows, you may not even remember partitioning your hard drive, as you generally only do it once and Windows does it for you when you install it on a new drive. More experienced Windows users might remember the days when larger disks had to be broken up into 2GB partitions because of partition size limitations in Windows 95.

A partition on a hard drive is kind of like a fence. You fence off areas of the drive that you want to assign to certain tasks, programs or purposes — almost like a large directory that is physically separated from other directories. Windows likes to use one big partition that covers the whole drive; while convenient, this is not very safe or efficient. Ideally you’d have one partition for your virtual memory (Windows calls this either a swap file or a paging file, and it is hard drive space used as temporary memory when your programs need more than your RAM can provide), one partition for your startup files, one for programs, one for personal data and configuration settings, and one for the operating system’s core files. This way is more modular, meaning it separates the major components and uses them interoperably. Having your disk drive set up in this manner would allow you to more easily recover from a system file corruption, and you could reinstall your operating system without losing program data or other information. Unfortunately Windows is not designed to accommodate this partition scheme; Windows is monolithic and stores everything in one big pile of data on a single partition. If you ever have to reinstall the OS, you will also have to reinstall all of your programs because all of the registry information that is stored in Windows will be gone.

GNU and other Unix operating systems usually prefer to use multiple partitions as mentioned above. Generally these are separated into /boot (for your startup files and kernel), /usr (for your program files), /var (for system databases, system mail, and system variables), /home (for your personal data and program settings) and a separate user-inaccessible partition for your virtual memory swap file. This partition scheme can be customized or modified, and you can use a completely different (even monolithic) setup if you like. But using a partition configuration like this one, you can erase and install several different GNU/Linux distributions on the same partitions and still retain your important data (by preserving the /home directory) and at least some of your installed programs. In GNU/Linux, for instance, you can install Sun Microsystems’ StarOffice suite in your /home directory, reinstall the operating system any number of times (even format the other partitions) and still be able to use StarOffice without any change in functionality. Your documents would also still be available and unaltered.

This brings up an important point that is worth stating more clearly: GNU/Linux is a modular operating system that is made up of interoperable and interchangeable parts. Windows is a monolithic operating system that cannot have its pieces and parts separated.

In all Unix operating systems, disks are represented by their method and order of connection to the motherboard. In GNU/Linux, the primary master drive is known as hda, the primary slave is hdb and so on. This is true regardless of whether the drive is a disk or a CD-ROM. In BSD Unix, that representation is slightly different. BSD calls the primary master hard drive ad0 and the primary slave is ad1. The first CD drive in BSD is called acd0 no matter where it is on the drive chain. Other Unixes have BSD-like methods of determining drive device names.

While those may be the drive representations, each drive of course has its partitions. The first partition on the first drive in GNU/Linux is called hda1, the second partition on the first drive is hda2 and so forth. BSD uses a different method of organization; it uses disk slices to determine which parts of the hard drive will be used for BSD, and then divides its slice space up into partitions. So ad0s1a would refer to the first partition on the first slice on the first hard drive. ad0s1b would refer to the second partition on the first slice on the first hard drive. Rarely will people use more than one disk slice, so the first four letters of the drive nomenclature generally stay the same.

A filesystem is a method of storing, organizing and retrieving data on a drive. When you format a hard drive, you’re initializing it for use with a specific filesystem. In Windows there are two filesystems: FAT32 (FAT stands for File Allocation Table) and NTFS (the New Technology File System). Windows 2000, XP, and 2003 use NTFS by default as it is more stable, efficient and secure than the older FAT32 filesystem, which is a holdover from DOS. All CDs use the ISO9660 filesystem, which is a standard across all platforms — a CD written in Windows 95 can be read in all subsequent versions of Windows as well as on a Macintosh or Sun computer, and can be read by any operating system that can access CD drives.

In the Unix world there are many more filesystems. The most common in GNU/Linux is EXT2, which also has a journalled version called EXT3. IBM designed a Unix filesystem called JFS (the Journalling File System); SGI designed XFS (the eXtended File System); and a man named Hans Reiser developed the ReiserFS with grants and donations from a variety of individuals, governments, and corporations.

When you want to use a drive (or more appropriately, a filesystem on a partition on a drive), you have to let the operating system know of your intentions. Ordinarily Unix operating systems will not connect to filesystems that they don’t specifically need. A startup file called fstab (short for filesystem table) tells the OS what filesystems to connect to and how to address them. This process of connection and linking is known as mounting a filesystem. In addition to telling the operating system what drive, partition and filesystem you’re interested in using, you also have to tell it where that filesystem will be referenced in your directory tree. For instance if you have a separate partition for your personal files and settings, you would generally mount this as the /home directory and you would specify this in fstab because you’d want that to be mounted every time you boot your system. If you want to address your CD drive, you might make a directory called /cdrom to link it to. Some Unixes prefer to mount CD drives in their own directories in the root directory (the topmost directory from which all others are branched), and some prefer to mount all non-system filesystems in a directory called /mnt.

Data is stored on the drive in a certain fashion consistent with the method the filesystem demands; the organization of that data is handled by meta data. So if we were to use a book analogy, the data on your filesystem would be all of the letters in all of the words that make up the book’s content, and the meta data would be the order, spacing, formatting and punctuation that defines those letters as words, sentences, paragraphs and chapters. Without the latter, the former is pretty useless. The meta data is not properly updated until the drive is unmounted, so a power failure, system lockup or crash would result in lost data because the meta data wouldn’t know anything about the most recently stored information. In a filesystem that is capable of journalling, the filesystem uses a sort of journal to keep track of everything it does on the fly. That means that even in the event of a power failure, the most recently written data in the journal will still be reasonably safe (although there are never any guarantees with any filesystem under such conditions). For this reason EXT2 is not generally used anymore because it has been functionally replaced by EXT3, although they share the same filesystem utilities. Essentially they are the same, with the important difference being EXT3’s journalling capabilities. JFS, ReiserFS, and XFS are all different kinds of journalling filesystems.

Most other Unixes use a variation of UFS (Unix File System), few of which are compatible with one another. Each implementation has its own benefits and liabilities, so you shouldn’t assume that the UFS that is used in Solaris has the same properties as the implementation used in FreeBSD.

Understanding Terms

Many terms can be figured out by picking apart their acronyms or by examining where in the operating system they are stored. Files you find in the /etc directory, for instance, will likely be configuration files of some kind. If you want to know what they configure, examine the name of the files and look through them with a text editor to see if they are commented with instructions.

There are a lot of different terms that don’t have obvious meanings. If you encounter language obstacles in your transition efforts, some good resources are the WhatIs database, the Google search engine, the manual pages for your OS (type man before the word in the command line) and our own message forums.

If you have information to add or a point to dispute, please post a message below. If you’d like to discuss this article, please use our message forums (click on the Forum link to the left) instead.

Discuss this article or get technical support on our forum.

Copyright 2004 Jem Matzan. Verbatim copying and redistribution of this entire article are permitted without royalty in any medium provided this notice is preserved.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

| Contact Us | About Us | RSS FAQ |
Copyright 2008. All content items belong to their respective authors.