computer languages: 2016

Thursday, 14 April 2016

OS/2 OPERATING SYSTEM

In the early 1990s, two of the biggest names in the PC world, IBM and Microsoft, joined forces
to create OS/2, with the goal of making it the "next big thing" in graphical operating systems. Well,
it didn't quite work out that way. :^) The story behind OS/2 includes some of the most fascinating
bits of PC industry history, but it's a long story and not one that really makes sense to get into here.
The short version goes something like this:

Microsoft and IBM create OS/2 with high hopes that it will revolutionize the PC desktop.
OS/2 has some significant technical strengths but also some problems.
Microsoft and IBM fight over how to fix the problems, and also over what direction to take for the
future of the operating system.
Microsoft decides, based on some combination of frustration over problems and desire for absolute
control, to drop OS/2 and focus on Windows instead.
IBM and Microsoft feud.
IBM supports OS/2 (somewhat half-heartedly) on its own, while Microsoft dominates the industry
with various versions of Windows.
Now, OS/2 aficionados will probably take issue with at least some of that summarization, but that is
what happened in a nutshell, or at least I think so. :^) At any rate, OS/2 continues to be supported
today, but really has been relegated to a niche role. I don't know how long IBM will continue to
support it.
OS/2's file system support is similar, in a way to that of Windows NT's. OS/2 supports FAT12 and
FAT16 for compatibility, but is really designed to use its own special file system, called HPFS.
HPFS is similar to NTFS (NT's native file system) though it is certainly not the same. OS/2 does not
have support for FAT32 built in, but that there are third-party tools available that will let OS/2
access FAT32 partitions. This may be required if you are running a machine with both OS/2 and
Windows partitions. I believe that OS/2 does not include support for NTFS partitions.

UNIX operation system. Main features and commands. UNIX / Linux

UNIX is one of the very oldest operating systems in the computer world, and is still widely
used today. However, it is not a very conspicuous operating system. Somewhat arcane in its
operation and interface, it is ideally suited for the needs of large enterprise computing systems. It is
also the most common operating system run by servers and other computers that form the bulk of
the Internet. While you may never use UNIX on your local PC, you are using it indirectly, in one
form or another, every time you log on to the 'net.

While few people run UNIX on their own systems, there are in fact a number of different
versions of UNIX available for the PC, and millions of PC users have chosen to install "UNIXy"
operating systems on their own desktop machines. There are dozens of variants of the basic UNIX
interface; the most popular one for the PC platform is Linux, which is itself available in many
flavors. While UNIX operating systems can be difficult to set up and require some knowledge to
operate, they are very stable and robust, are efficient with system resources--and are generally free
or very inexpensive to obtain.

UNIX operating systems are designed to use the "UNIX file system". I put that phrase in
quotes, because there is no single UNIX file system, any more than there is a single UNIX
operating system. However, the file systems used by most of the UNIX operating system types out
there are fairly similar, and rather distinct from the file systems used by other operating systems,
such as DOS or Windows.

As an operating system geared specifically for use on the PC, Linux is the UNIX variant that
gets the most attention in PC circles. To improve its appeal, the programmers who are continually
working to update and improve Linux have put into the operating system compatibility support for
most of the other operating systems out there. Linux will read and write to FAT partitions, and with
newer versions this includes FAT32.

Unix (officially trademarked as UNIX®) is a computer operating system originally
developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis
Ritchie and Douglas McIlroy. Today's Unix systems are split into various branches, developed over
time by AT&T as well as various commercial vendors and non-profit organizations.

As of 2007, the owner of the trademark UNIX® is The Open Group, an industry standards
consortium. Only systems fully compliant with and certified to the Single UNIX Specification
qualify as "UNIX®" (others are called "Unix system-like" or "Unix-like").

During the late 1970s and early 1980s, Unix's influence in academic circles led to largescale
adoption of Unix (particularly of the BSD variant, originating from the University of
California, Berkeley) by commercial startups, the most notable of which is Sun Microsystems.
Today, in addition to certified Unix systems, Unix-like operating systems such as Linux and BSD
derivatives are commonly encountered.

Sometimes, "traditional Unix" may be used to describe a Unix or an operating system that has
the characteristics of either Version 7 Unix or UNIX System V.

Overview
Unix operating systems are widely used in both servers and workstations. The Unix
environment and the client-server program model were essential elements in the development of the
Internet and the reshaping of computing as centered in networks rather than in individual
computers.

Both Unix and the C programming language were developed by AT&T and distributed to
government and academic institutions, causing both to be ported to a wider variety of machine
families than any other operating system. As a result, Unix became synonymous with "open
systems".

Unix was designed to be portable, multi-tasking and multi-user in a time-sharing
configuration. Unix systems are characterized by various concepts: the use of plain text for storing
data; a hierarchical file system; treating devices and certain types of inter-process communication
(IPC) as files; and the use of a large number of small programs that can be strung together through a
command line interpreter using pipes, as opposed to using a single monolithic program that includes
all of the same functionality. These concepts are known as the Unix philosophy.

Under Unix, the "operating system" consists of many of these utilities along with the master
control program, the kernel. The kernel provides services to start and stop programs, handle the file
system and other common "low level" tasks that most programs share, and, perhaps most
importantly, schedules access to hardware to avoid conflicts if two programs try to access the same
resource or device simultaneously. To mediate such access, the kernel was given special rights on
the system and led to the division between user-space and kernel-space.

The microkernel tried to reverse the growing size of kernels and return to a system in which
most tasks were completed by smaller utilities. In an era when a "normal" computer consisted of a
hard disk for storage and a data terminal for input and output (I/O), the Unix file model worked
quite well as most I/O was "linear". However, modern systems include networking and other new
devices. Describing a graphical user interface driven by mouse control in an "event driven" fashion
didn't work well under the old model. Work on systems supporting these new devices in the 1980s
led to facilities for non-blocking I/O, forms of inter-process communications other than just pipes,
as well as moving functionality such as network protocols out of the kernel.

History

A partial list of simultaneously running processes on a Unix system.

In the 1960s, the Massachusetts Institute of Technology, AT&T Bell Labs, and General
Electric worked on an experimental operating system called Multics (Multiplexed Information and
Computing Service), which was designed to run on the GE-645 mainframe computer. The aim was
the creation of a commercial product, although this was never a great success. Multics was an
interactive operating system with many novel capabilities, including enhanced security. The project
did develop production releases, but initially these releases performed poorly.

AT&T Bell Labs pulled out and deployed its resources elsewhere. One of the developers on
the Bell Labs team, Ken Thompson, continued to develop for the GE-645 mainframe, and wrote a
game for that computer called Space Travel. However, he found that the game was too slow on the
GE machine and was expensive, costing $75 per execution in scarce computing time.

Thompson thus re-wrote the game in assembly language for Digital Equipment Corporation's
PDP-7 with help from Dennis Ritchie. This experience, combined with his work on the Multics
project, led Thompson to start a new operating system for the PDP-7. Thompson and Ritchie led a
team of developers, including Rudd Canaday, at Bell Labs developing a file system as well as the
new multi-tasking operating system itself. They included a command line interpreter and some
small utility programs.

Editing a shell script using the ed editor. The dollar-sign at the top of the screen is the prompt
printed by the shell. 'ed' is typed to start the editor, which takes over from that point on the screen
downwards.

1970s

In 1970 the project was named Unics, and could - eventually - support two simultaneous
users. Brian Kernighan invented this name as a contrast to Multics; the spelling was later changed
to Unix.

Up until this point there had been no financial support from Bell Labs. When the Computer
Science Research Group wanted to use Unix on a much larger machine than the PDP-7, Thompson
and Ritchie managed to trade the promise of adding text processing capabilities to Unix for a PDP-
11/20 machine. This led to some financial support from Bell. For the first time in 1970, the Unix
operating system was officially named and ran on the PDP-11/20. It added a text formatting
program called roff and a text editor. All three were written in PDP-11/20 assembly language. Bell
Labs used this initial "text processing system", made up of Unix, roff, and the editor, for text
processing of patent applications. Roff soon evolved into troff, the first electronic publishing
program with a full typesetting capability. The UNIX Programmer's Manual was published on
November 3, 1971.

In 1973, Unix was rewritten in the C programming language, contrary to the general notion
at the time "that something as complex as an operating system, which must deal with time-critical
events, had to be written exclusively in assembly language" [4]. The migration from assembly
language to the higher-level language C resulted in much more portable software, requiring only a
relatively small amount of machine-dependent code to be replaced when porting Unix to other
computing platforms.

AT&T made Unix available to universities and commercial firms, as well as the United
States government under licenses. The licenses included all source code including the machinedependent

parts of the kernel, which were written in PDP-11 assembly code. Copies of the
annotated Unix kernel sources circulated widely in the late 1970s in the form of a much-copied
book by John Lions of the University of New South Wales, the Lions' Commentary on UNIX 6th
Edition, with Source Code, which led to considerable use of Unix as an educational example.

Versions of the Unix system were determined by editions of its user manuals, so that (for
example) "Fifth Edition UNIX" and "UNIX Version 5" have both been used to designate the same
thing. Development expanded, with Versions 4, 5, and 6 being released by 1975. These versions
added the concept of pipes, leading to the development of a more modular code-base, increasing
development speed still further. Version 5 and especially Version 6 led to a plethora of different
Unix versions both inside and outside Bell Labs, including PWB/UNIX, IS/1 (the first commercial
Unix), and the University of Wollongong's port to the Interdata 7/32 (the first non-PDP Unix).

In 1978, UNIX/32V, for the VAX system, was released. By this time, over 600 machines
were running Unix in some form. Version 7 Unix, the last version of Research Unix to be released
widely, was released in 1979. Versions 8, 9 and 10 were developed through the 1980s but were only
released to a few universities, though they did generate papers describing the new work. This
research led to the development of Plan 9 from Bell Labs, a new portable distributed system.

1980s

A late-80s style Unix desktop running the X Window System graphical user interface. Shown
are a number of client applications common to the MIT X Consortium's distribution, including
Tom's Window Manager, an X Terminal, Xbiff, xload, and a graphical manual page browser.

AT&T now licensed UNIX System III, based largely on Version 7, for commercial use, the
first version launching in 1982. This also included support for the VAX. AT&T continued to issue
licenses for older Unix versions. To end the confusion between all its differing internal versions,
AT&T combined them into UNIX System V Release 1. This introduced a few features such as the
vi editor and curses from the Berkeley Software Distribution of Unix developed at the University of
California, Berkeley. This also included support for the Western Electric 3B series of machines.

Since the newer commercial UNIX licensing terms were not as favorable for academic use
as the older versions of Unix, the Berkeley researchers continued to develop BSD Unix as an
alternative to UNIX System III and V, originally on the PDP-11 architecture (the 2.xBSD releases,
ending with 2.11BSD) and later for the VAX-11 (the 4.x BSD releases). Many contributions to
Unix first appeared on BSD systems, notably the C shell with job control (modelled on ITS).
Perhaps the most important aspect of the BSD development effort was the addition of TCP/IP
network code to the mainstream Unix kernel. The BSD effort produced several significant releases
that contained network code: 4.1cBSD, 4.2BSD, 4.3BSD, 4.3BSD-Tahoe ("Tahoe" being the
nickname of the Computer Consoles Inc. Power 6/32 architecture that was the first non-DEC
release of the BSD kernel), Net/1, 4.3BSD-Reno (to match the "Tahoe" naming, and that the release
was something of a gamble), Net/2, 4.4BSD, and 4.4BSD-lite. The network code found in these
releases is the ancestor of much TCP/IP network code in use today, including code that was later
released in AT&T System V UNIX and early versions of Microsoft Windows. The accompanying
Berkeley Sockets API is a de facto standard for networking APIs and has been copied on many
platforms.

Other companies began to offer commercial versions of the UNIX System for their own
mini-computers and workstations. Most of these new Unix flavors were developed from the System
V base under a license from AT&T; however, others were based on BSD instead. One of the
leading developers of BSD, Bill Joy, went on to co-found Sun Microsystems in 1982 and create
SunOS (now Solaris) for their workstation computers. In 1980, Microsoft announced its first Unix
for 16-bit microcomputers called Xenix, which the Santa Cruz Operation (SCO) ported to the Intel
8086 processor in 1983, and eventually branched Xenix into SCO UNIX in 1989.

For a few years during this period (before PC compatible computers with MS-DOS became
dominant), industry observers expected that UNIX, with its portability and rich capabilities, was
likely to become the industry standard operating system for microcomputers.[5] In 1984 several
companies established the X/Open consortium with the goal of creating an open system
specification based on UNIX. Despite early progress, the standardization effort collapsed into the
"Unix wars," with various companies forming rival standardization groups. The most successful
Unix-related standard turned out to be the IEEE's POSIX specification, designed as a compromise
API readily implemented on both BSD and System V platforms, published in 1988 and soon
mandated by the United States government for many of its own systems.

AT&T added various features into UNIX System V, such as file locking, system
administration, streams, new forms of IPC, the Remote File System and TLI. AT&T cooperated
with Sun Microsystems and between 1987 and 1989 merged features from Xenix, BSD, SunOS,
and System V into System V Release 4 (SVR4), independently of X/Open. This new release
consolidated all the previous features into one package, and heralded the end of competing versions.
It also increased licensing fees.

During this time a number of vendors including Digital Equipment, Sun, Addamax and
others began building trusted versions of UNIX for high security applications, mostly designed for
military and law enforcement applications.

The Common Desktop Environment or CDE, a graphical desktop for Unix co-developed in
the 1990s by HP, IBM, and Sun as part of the COSE initiative.

1990s

In 1990, the Open Software Foundation released OSF/1, their standard Unix

implementation, based on Mach and BSD. The Foundation was started in 1988 and was funded by
several Unix-related companies that wished to counteract the collaboration of AT&T and Sun on
SVR4. Subsequently, AT&T and another group of licensees formed the group "UNIX International"
in order to counteract OSF. This escalation of conflict between competing vendors gave rise again
to the phrase "Unix wars".

In 1991, a group of BSD developers (Donn Seeley, Mike Karels, Bill Jolitz, and Trent Hein)
left the University of California to found Berkeley Software Design, Inc (BSDI). BSDI produced a
fully functional commercial version of BSD Unix for the inexpensive and ubiquitous Intel platform,
which started a wave of interest in the use of inexpensive hardware for production computing.
Shortly after it was founded, Bill Jolitz left BSDI to pursue distribution of 386BSD, the free
software ancestor of FreeBSD, OpenBSD, and NetBSD.

By 1993 most commercial vendors had changed their variants of Unix to be based on
System V with many BSD features added on top. The creation of the COSE initiative that year by
the major players in Unix marked the end of the most notorious phase of the Unix wars, and was
followed by the merger of UI and OSF in 1994. The new combined entity, which retained the OSF
name, stopped work on OSF/1 that year. By that time the only vendor using it was Digital, which
continued its own development, rebranding their product Digital UNIX in early 1995.

Shortly after UNIX System V Release 4 was produced, AT&T sold all its rights to UNIX®
to Novell. (Dennis Ritchie likened this to the Biblical story of Esau selling his birthright for the
proverbial "mess of pottage".[6]) Novell developed its own version, UnixWare, merging its NetWare
with UNIX System V Release 4. Novell tried to use this to battle against Windows NT, but their
core markets suffered considerably.

In 1993, Novell decided to transfer the UNIX® trademark and certification rights to the
X/Open Consortium.[7] In 1996, X/Open merged with OSF, creating the Open Group. Various
standards by the Open Group now define what is and what is not a "UNIX" operating system,
notably the post-1998 Single UNIX Specification.

In 1995, the business of administering and supporting the existing UNIX licenses, plus
rights to further develop the System V code base, were sold by Novell to the Santa Cruz
Operation.[1] Whether Novell also sold the copyrights is currently the subject of litigation (see
below).

In 1997, Apple Computer sought out a new foundation for its Macintosh operating system
and chose NEXTSTEP, an operating system developed by NeXT. The core operating system was
renamed Darwin after Apple acquired it. It was based on the BSD family and the Mach kernel. The
deployment of Darwin BSD Unix in Mac OS X makes it, according to a statement made by an
Apple employee at a USENIX conference, the most widely used Unix-based system in the desktop
computer market.

2000 to present

Fig. 8. A modern Unix desktop environment (Solaris 10)

In 2000, SCO sold its entire UNIX business and assets to Caldera Systems, which later on

changed its name to The SCO Group. This new player then started legal action against various users

and vendors of Linux. SCO have alleged that Linux contains copyrighted Unix code now owned by

The SCO Group. Other allegations include trade-secret violations by IBM, or contract violations by

former Santa Cruz customers who have since converted to Linux. However, Novell disputed the

SCO group's claim to hold copyright on the UNIX source base. According to Novell, SCO (and

hence the SCO Group) are effectively franchise operators for Novell, which also retained the core

copyrights, veto rights over future licensing activities of SCO, and 95% of the licensing revenue.

The SCO Group disagreed with this, and the dispute had resulted in the SCO v. Novell lawsuit.

In 2005, Sun Microsystems released the bulk of its Solaris system code (based on UNIX

System V Release 4) into an open source project called OpenSolaris. New Sun OS technologies

such as the ZFS file system are now first released as open source code via the OpenSolaris project;

as of 2006 it has spawned several non-Sun distributions such as SchilliX, Belenix, Nexenta and

MarTux.

The Dot-com crash has led to significant consolidation of Unix users as well. Of the many

commercial flavors of Unix that were born in the 1980s, only Solaris, HP-UX, and AIX are still

doing relatively well in the market, though SGI's IRIX persisted for quite some time. Of these,

Solaris has the most market share, and may be gaining popularity due to its feature set and also

since it now has an Open Source version.

Standards

Beginning in the late 1980s, an open operating system standardization effort now known as
POSIX provided a common baseline for all operating systems; IEEE based POSIX around the
common structure of the major competing variants of the Unix system, publishing the first POSIX
standard in 1988. In the early 1990s a separate but very similar effort was started by an industry
consortium, the Common Open Software Environment (COSE) initiative, which eventually became
the Single UNIX Specification administered by The Open Group). Starting in 1998 the Open Group
and IEEE started the Austin Group, to provide a common definition of POSIX and the Single UNIX
Specification.

In an effort towards compatibility, in 1999 several Unix system vendors agreed on SVR4's
Executable and Linkable Format (ELF) as the standard for binary and object code files. The
common format allows substantial binary compatibility among Unix systems operating on the same
CPU architecture.

The Filesystem Hierarchy Standard was created to provide a reference directory layout for
Unix-like operating systems, particularly Linux. This type of standard however is controversial, and
even within the Linux community its adoption is far from universal.

Components

The Unix system is composed of several components that are normally packaged together. By
including — in addition to the kernel of an operating system — the development environment,
libraries, documents, and the portable, modifiable source-code for all of these components, Unix
was a self-contained software system. This was one of the key reasons it emerged into an important
teaching and learning tool and had such a broad influence.

Inclusion of these components did not make the system large — the original V7 UNIX
distribution, consisting of copies of all of the compiled binaries plus all of the source code and
documentation occupied less than 10Mb, and arrived on a single 9-track magtape. The printed
documentation, typeset from the on-line sources, was contained in two volumes.

The names and filesystem locations of the Unix components has changed substantially across
the history of the system. Nonetheless, the V7 implementation is considered by many to have the
canonical early structure:

• Kernel — source code in /usr/sys, composed of several sub-components:
o conf — configuration and machine-dependent parts, including boot code
o dev — device drivers for control of hardware (and some pseudo-hardware)
o sys — operating system "kernel", handling memory management, process
scheduling, system calls, etc.
o h — header files, defining key structures within the system and important systemspecific
invariables

• Development Environment — Early versions of Unix contained a development
environment sufficient to recreate the entire system from source code:

o cc — C language compiler (first appeared in V3 Unix)
o as — machine-language assembler for the machine
o ld — linker, for combining object files
o lib — object-code libraries (installed in /lib or /usr/lib) libc, the system library with
C run-time support, was the primary library, but there have always been additional
libraries for such things as mathematical functions (libm) or database access. V7
Unix introduced the first version of the modern "Standard I/O" library stdio as part
of the system library. Later implementations increased the number of libraries
significantly.
o make - build manager (introduced in PWB/UNIX), for effectively automating the
build process
o include — header files for software development, defining standard interfaces and
system invariants
o Other languages — V7 Unix contained a Fortran-77 compiler, a programmable
arbitrary-precision calculator (bc, dc), and the awk "scripting" language, and later
versions and implementations contain many other language compilers and toolsets.
Early BSD releases included Pascal tools, and many modern Unix systems also
include the GNU Compiler Collection as well as or instead of a proprietary compiler
system.
o Other tools — including an object-code archive manager (ar), symbol-table lister
(nm), compiler-development tools (e.g. lex & yacc), and debugging tools.

• Commands — Unix makes little distinction between commands (user-level programs) for
system operation and maintenance (e.g. cron), commands of general utility (e.g. grep), and
more general-purpose applications such as the text formatting and typesetting package.
Nonetheless, some major categories are:

o sh — The "shell" programmable command-line interpreter, the primary user
interface on Unix before window systems appeared, and even afterward (within a
"command window").
o Utilities — the core tool kit of the Unix command set, including cp, ls, grep, find and
many others. Subcategories include:
§ System utilities — administrative tools such as mkfs, fsck, and many others
§ User utilities — environment management tools such as passwd, kill, and
others.
o Document formatting — Unix systems were used from the outset for document
preparation and typesetting systems, and included many related programs such as
nroff, troff, tbl, eqn, refer, and pic. Some modern Unix systems also include
packages such as TeX and GhostScript.
o Graphics — The plot subsystem provided facilities for producing simple vector plots
in a device-independent format, with device-specific interpreters to display such
files. Modern Unix systems also generally include X11 as a standard windowing
system and GUI, and many support OpenGL.
o Communications — Early Unix systems contained no inter-system communication,
but did include the inter-user communication programs mail and write. V7
introduced the early inter-system communication system UUCP, and systems
beginning with BSD release 4.1c included TCP/IP utilities.

The 'man' command can display a 'man page' for every command on the system, including itself.
• Documentation — Unix was the first operating system to include all of its documentation
online in machine-readable form. The documentation included:
o man — manual pages for each command, library component, system call, header
file, etc.
o doc — longer documents detailing major subsystems, such as the C language and
troff

Impact

The Unix system had significant impact on other operating systems.

It was written in high level language as opposed to assembly language (which had been
thought necessary for systems implementation on early computers). Although this followed the lead
of Multics and Burroughs, it was Unix that popularized the idea.

Unix had a drastically simplified file model compared to many contemporary operating
systems, treating all kinds of files as simple byte arrays. The file system hierarchy contained
machine services and devices (such as printers, terminals, or disk drives), providing a uniform
interface, but at the expense of occasionally requiring additional mechanisms such as ioctl and
mode flags to access features of the hardware that did not fit the simple "stream of bytes" model.
The Plan 9 operating system pushed this model even further and eliminated the need for additional
mechanisms.

Unix also popularized the hierarchical file system with arbitrarily nested subdirectories,
originally introduced by Multics. Other common operating systems of the era had ways to divide a
storage device into multiple directories or sections, but they had a fixed number of levels, often only
one level. Several major proprietary operating systems eventually added recursive subdirectory
capabilities also patterned after Multics. DEC's RSX-11M's "group, user" hierarchy evolved into
VMS directories, CP/M's volumes evolved into MS-DOS 2.0+ subdirectories, and HP's MPE
group.account hierarchy and IBM's SSP and OS/400 library systems were folded into broader
POSIX file systems.

Making the command interpreter an ordinary user-level program, with additional commands
provided as separate programs, was another Multics innovation popularized by Unix. The Unix
shell used the same language for interactive commands as for scripting (shell scripts — there was
no separate job control language like IBM's JCL). Since the shell and OS commands were "just
another program", the user could choose (or even write) his own shell. New commands could be
added without changing the shell itself. Unix's innovative command-line syntax for creating chains
of producer-consumer processes (pipelines) made a powerful programming paradigm (coroutines)
widely available. Many later command-line interpreters have been inspired by the Unix shell.

A fundamental simplifying assumption of Unix was its focus on ASCII text for nearly all
file formats. There were no "binary" editors in the original version of Unix — the entire system was
configured using textual shell command scripts. The common denominator in the I/O system is the
byte — unlike "record-based" file systems in other computers. The focus on text for representing
nearly everything made Unix pipes especially useful, and encouraged the development of simple,
general tools that could be easily combined to perform more complicated ad hoc tasks. The focus
on text and bytes made the system far more scalable and portable than other systems. Over time,
text-based applications have also proven popular in application areas, such as printing languages
(PostScript), and at the application layer of the Internet Protocols, e.g. Telnet, FTP, SSH, SMTP,
HTTP and SIP.

Unix popularised a syntax for regular expressions that found widespread use. The Unix
programming interface became the basis for a widely implemented operating system interface
standard (POSIX, see above).

The C programming language soon spread beyond Unix, and is now ubiquitous in systems and
applications programming.

Early Unix developers were important in bringing the theory of modularity and reusability into
software engineering practice, spawning a "Software Tools" movement.

Unix provided the TCP/IP networking protocol on relatively inexpensive computers, which
contributed to the Internet explosion of world-wide real-time connectivity, and which formed the
basis for implementations on many other platforms. (This also exposed numerous security holes in
the networking implementations.)

The Unix policy of extensive on-line documentation and (for many years) ready access to all system
source code raised programmer expectations, contributed to the 1983 launch of the free software
movement.

Over time, the leading developers of Unix (and programs that ran on it) evolved a set of
cultural norms for developing software, norms which became as important and influential as the
technology of Unix itself; this has been termed the Unix philosophy.

Unix stores system time values as the number of seconds from midnight January 1, 1970
(the "Unix Epoch") in variables of type time_t, historically defined as "signed 32-bit integer". On
January 19, 2038, the current time will roll over from a zero followed by 31 ones
(01111111111111111111111111111111) to a one followed by 31 zeros
(10000000000000000000000000000000), which will reset time to the year 1901 or 1970,
depending on implementation, because that toggles the sign bit. As many applications use OS
library routines for date calculations, the impact of this could be felt much earlier than 2038; for
instance, 30-year mortgages may be calculated incorrectly beginning in the year 2008.

Since times before 1970 are rarely represented in Unix time, one possible solution that is
compatible with existing binary formats would be to redefine time_t as "unsigned 32-bit integer".
However, such a kludge merely postpones the problem to February 7, 2106, and could introduce
bugs in software that compares differences between two sets of time.

Some Unix versions have already addressed this. For example, in Solaris on 64-bit systems,
time_t is 64 bits long, meaning that the OS itself and 64-bit applications will correctly handle dates
for some 292 billion years (several times greater than the age of the universe). Existing 32-bit
applications using a 32-bit time_t continue to work on 64-bit Solaris systems but are still prone to
the 2038 problem.

Free Unix-like operating systems

Fig. 9. Linux is a modern Unix-like system

In 1983, Richard Stallman announced the GNU project, an ambitious effort to create a free

software Unix-like system; "free" in that everyone who received a copy would be free to use, study,

modify, and redistribute it. GNU's goal was achieved in 1992. Its own kernel development project,

GNU Hurd, had not produced a working kernel, but a compatible kernel called Linux was released

as free software in 1992 under the GNU General Public License. The combination of the two is

frequently referred to simply as "Linux", although the Free Software Foundation and some Linux

distributions, such as Debian GNU/Linux, use the combined term GNU/Linux. Work on GNU Hurd

continues, although very slowly.

In addition to their use in the Linux operating system, many GNU packages — such as the GNU

Compiler Collection (and the rest of the GNU toolchain), the GNU C library and the GNU core

utilities — have gone on to play central roles in other free Unix systems as well.

Linux distributions, comprising Linux and large collections of compatible software have

become popular both with hobbyists and in business. Popular distributions include Red Hat

Enterprise Linux, SUSE Linux, Mandriva Linux, Fedora, Ubuntu, Debian GNU/Linux, Slackware

Linux and Gentoo.

A free derivative of BSD Unix, 386BSD, was also released in 1992 and led to the NetBSD

and FreeBSD projects. With the 1994 settlement of a lawsuit that UNIX Systems Laboratories

brought against the University of California and Berkeley Software Design Inc. (USL v. BSDi), it

was clarified that Berkeley had the right to distribute BSD Unix — for free, if it so desired. Since

then, BSD Unix has been developed in several different directions, including the OpenBSD and

DragonFly BSD variants.

Linux and the BSD kin are now rapidly occupying the market traditionally occupied by

proprietary Unix operating systems, as well as expanding into new markets such as the consumer

desktop and mobile and embedded devices. A measure of this success may be seen when Apple

Computer incorporated BSD into its Macintosh operating system by way of NEXTSTEP. Due to

the modularity of the Unix design, sharing bits and pieces is relatively common; consequently, most

or all Unix and Unix-like systems include at least some BSD code, and modern BSDs also typically

include some GNU utilities in their distribution, so Apple's combination of parts from NeXT and

FreeBSD with Mach and some GNU utilities has precedent.

In 2005, Sun Microsystems released the bulk of the source code to the Solaris operating

system, a System V variant, under the name OpenSolaris, making it the first actively developed

commercial Unix system to be open sourced (several years earlier, Caldera had released many of

the older Unix systems under an educational and later BSD license). As a result, a great deal of

formerly proprietary AT&T/USL code is now freely available.

Branding

In October 1993, Novell, the company that owned the rights to the Unix System V source at

the time, transferred the trademarks of Unix to the X/Open Company (now The Open Group),[9] and

in 1995 sold the related business operations to Santa Cruz Operation.[10] Whether Novell also sold

the copyrights to the actual software is currently the subject of litigation in a federal lawsuit, SCO v.

Novell. Unix vendor SCO Group Inc. accused Novell of slander of title.

The present owner of the trademark UNIX® is The Open Group, an industry standards consortium.

Only systems fully compliant with and certified to the Single UNIX Specification qualify as

"UNIX®" (others are called "Unix system-like" or "Unix-like"). The term UNIX is not an acronym,

but follows the early convention of naming computer systems in capital letters, such as ENIAC and

MISTIC.

By decree of The Open Group, the term "UNIX®" refers more to a class of operating systems than
to a specific implementation of an operating system; those operating systems which meet The Open
Group's Single UNIX Specification should be able to bear the UNIX® 98 or UNIX® 03 trademarks
today, after the operating system's vendor pays a fee to The Open Group. Systems licensed to use
the UNIX® trademark include AIX, HP-UX, IRIX, Solaris, Tru64, A/UX, Mac OS X 10.5 on Intel
platforms[11], and a part of z/OS.

Sometimes a representation like "Un*x", "*NIX", or "*N?X" is used to indicate all operating
systems similar to Unix. This comes from the use of the "*" and "?" characters as "wildcard"
characters in many utilities. This notation is also used to describe other Unix-like systems, e.g.
Linux, FreeBSD, etc., that have not met the requirements for UNIX® branding from the Open
Group.

The Open Group requests that "UNIX®" is always used as an adjective followed by a generic
term such as "system" to help avoid the creation of a genericized trademark.

The term "Unix" is also used, and in fact was the original capitalisation, but the name UNIX
stuck because, in the words of Dennis Ritchie "when presenting the original Unix paper to the third
Operating Systems Symposium of the American Association for Computing Machinery, we had just
acquired a new typesetter and were intoxicated by being able to produce small caps" (quoted from
the Jargon File, version 4.3.3, 20 September 2002). Additionally, it should be noted that many of
the operating system's predecessors and contemporaries used all-uppercase lettering, because many
computer terminals of the time could not produce lower-case letters, so many people wrote the
name in upper case due to force of habit.

Several plural forms of Unix are used to refer to multiple brands of Unix and Unix-like
systems. Most common is the conventional "Unixes", but the hacker culture which created Unix has
a penchant for playful use of language, and "Unices" (treating Unix as Latin noun of the third
declension) is also popular. The Anglo-Saxon plural form "Unixen" is not common, although
occasionally seen.

Trademark names can be registered by different entities in different countries and trademark
laws in some countries allow the same trademark name to be controlled by two different entities if
each entity uses the trademark in easily distinquishable categories. The result is that Unix has been
used as a brand name for various products including book shelves, ink pens, bottled glue, diapers,
hair driers and food containers. [2].

Common Unix commands

Widely used Unix commands include:
• Directory and file creation and navigation: ls cd pwd mkdir rm rmdir cp find touch
• File viewing and editing: more less ed vi emacs head tail
• Text processing: echo cat grep sort uniq sed awk cut tr split printf
• File comparison: comm cmp diff patch
• Miscellaneous shell tools: yes test xargs
• System administration: chmod chown ps su w who
• Communication: mail telnet ftp finger ssh
• Authentication: su login passwd

Cash memory

DISK CASHING

To help understand the theory of caching, visualize an old, hand-operated water pump. Each
stroke of the pump's handle delivers a set amount of water into a glass. It may take two or three
handle strokes to fill a glass. Now, visualize several glasses that need to be filled. You are
constantly pumping the handle to keep up with the demand. Next, introduce a holding tank. With
this, instead of the water going directly into a glass, it goes into the tank. The advantage is, once the
holding tank is filled, constant pumping is not required to keep up with the demand.

Disk caching may be thought of as an electronic version of a holding tank. With MS-DOS version
5.0, the holding tank is built in with Smartdrv.sys.

Cache: A bank of high-speed memory set aside for frequently accessed data. The term "cashing"
describes placing data in the cache. Memory caching and disk cashing are the two most common
methods used by PCs.

Keeping the most frequently used disk sectors in operational memory (hereafter – RAM) is
called disk cashing. It is used to increase speed of information exchange between the hard disk and
RAM. It’s well known that the relatively low speed of information exchange between these two
devices used to be one of the weakest points limiting computer productivity. No doubt, there are
some other weak points, for example, information exchange between fast microprocessor and slow
RAM, but, as long as DOS doesn’t suggest any ways of dealing with such problems, we are not
going to consider them.

To perform disk cashing, a special buffer region called cash is organized in RAM. It works
as a canal for information exchange and is operated by the resident program called cash
administrator.

Read data is placed into cash and kept there until another new portion of data replaces it.
When the specific data is required it can be retrieved from the fast cash and there will be no need of
reading it from the disk. So, the speed of data reading from the “disk” increases. This procedure is
called “end-to-end reading”.

Even more noticeable effect is achieved by preliminary reading (without operational
system’s request) of data and placing it into cash, because this operation can be fulfilled without
microprocessor being idle, that is asynchronically.

Most of the modern cash administrators provide cashing not only for reading, but also for
writing. Write cashing is used when placing data on disk after operational system’s instructions.
Firstly, this data is placed into cash and then, when it is “convenient” for a PC, placed to disk, so
that the real writing into disk is organized asynchronically. Further we will call this process an
“intermediate writing”. After writing data into cash instead of writing it to the disk, DOS is notified
about the end of writing operation. Since it is accomplished much faster than writing straight to the
disk, write cashing is very effective. This effect is even more noticeable when executing such
operations:

renewal of the data recently written to the disk (in the case of cashing it may be refreshed in RAM)
using (repeatedly reading) recently written to disk data (because of cashing the writing process can
be done without reading data from the disk).

Besides increasing the productivity of the PC, the disk cashing allows to increase the
working lifetime of the hard disk due to the reduction of the disk wear.

Disk cashing successfully combines the positive points of the I/O buffering and of the
virtual disk usage, because (analogically to virtual disk) it provides storing of big amounts of data in
RAM and (analogically to I/O buffers) keeps only the frequently used data. This allows minimizing
the need of allocating large amounts of RAM to be used as a buffer. Besides, cashing (analogically
to I/O buffering) is totally “transparent” for the users and programs, when using a virtual disk user
must copy files to it by himself, and after that, probably, has to configure the programs, that will use
those files. However, cash administrator is usually much bigger than the virtual disk driver due to
the amount of operations it has to perform. This can cause user to stop using disk cashing.

DOS simply can’t do without I/O buffering, which represents the simplified variant of
cashing. That’s why it is represented with the compact code: anticipatory reading and intermediate
writing aren’t executed. However the purpose of I/O buffering is not just to minimize the access
time to the same data, but also to extract the logical records from the physical records and viceversa
– to form physical records from the logical records. Physical record is a portion of data,
which is transferred between RAM and external memory (for disks – contents of a sector). Logical
record is a portion of data, inquired by a program or outputted by it. The tools of the I/O buffering
allow reading of the physical record for only one time, even if there are several logical records in it
needed for a certain program’s performance. Analogically, physical record is written to disk only
after it is formed from several logical records. Without the I/O buffering tools the reading of each
logical record (even from the same sector) would cause the frequent reading of this sector from the
disk. As for the output of each logical record, it would require the operation of writing the whole
physical record to disk, moreover after it’s anticipatory reading and refreshing. All these operations,
in addition to the significant waste of time, would require additional efforts of the programmers.

Since I/O buffering tools perform blocking and unblocking of physical records, cashing
tools are used for organizing work with physical records (in case of disks – the contents of the
sectors).

Disk Caching With MS-DOS' Smartdrv.sys

Total system performance is a composite of several factors. Two main factors are central
processing unit (CPU) type and speed, and hard drive access time. Other factors in the mix are the
software programs themselves. Certain programs, like databases and some computer-aided design
(CAD) packages, constantly access your hard drive by opening and closing files. Since the hard
drive is a mechanical device with parts like read/write heads that physically access data, this
constant access slows things down. Short of buying faster equipment, changing the way data is
transferred to the CPU is the most effective way to speed up your system. This can be done with
disk caching (pronounced disk "cashing").

Memory control drivers

SMARTDRV.SYS is a disk caching program for DOS and Windows 3.x systems. The
smartdrive program keeps a copy of recently accessed hard disk data in memory. When a program
or MSDOS reads data, smartdrive first checks to see if it already has a copy and if so supplies it
instead of reading from the hard disk.

Memory Management

• First 640k is Conventional Memory
• 640k to 1024k is Upper Memory
• Above 1024k is Extended Memory
• HIMEM.SYS is loaded in CONFIG.SYS as the first driver to manage the Extended
Memory are and to convert this to XMS (Extended Memory Specification). The first 64k of
extended memory has been labeled High Memory (HMA). DOS can be put here by putting
DOS=HIGH in CONFIG.SYS.
• EMM386.EXE is loaded in CONFIG.SYS after HIMEM.SYS has been successfully
loaded. This is used in the hardware reserved 384k of space in upper memory (640k-1024k)
and creates EMS (Extended Memory Specification).
• Virtual Memory relies upon EMS (therefore EMM386.EXE) and uses hard disk space as
memory.
• Memory control commands

Software operation

Computer software has to be "loaded" into the computer's storage (also known as memory
and RAM).

Once the software is loaded, the computer is able to operate the software. Computers operate
by executing the computer program. This involves passing instructions from the application
software, through the system software, to the hardware which ultimately receives the instruction as
machine code. Each instruction causes the computer to carry out an operation -- moving data,
carrying out a computation, or altering the control flow of instructions.

Data movement is typically from one place in memory to another. Sometimes it involves
moving data between memory and registers which enable high-speed data access in the CPU.
Moving data, especially large amounts of it, can be costly. So, this is sometimes avoided by using
"pointers" to data instead. Computations include simple operations such as incrementing the value
of a variable data element. More complex computations may involve many operations and data
elements together.

Instructions may be performed sequentially, conditionally, or iteratively. Sequential
instructions are those operations that are performed one after another. Conditional instructions are
performed such that different sets of instructions execute depending on the value(s) of some data. In
some languages this is know as an "if" statement. Iterative instructions are perfomed repetitively
and may depend on some data value. This is sometimes called a "loop." Often, one instruction may
"call" another set of instructions that are defined in some other program or module. When more
than one computer processor is used, instructions may be executed simultaneously.

A simple example of the way software operates is what happens when a user selects an entry
such as "Copy" from a menu. In this case, a conditional instruction is executed to copy text from
data in a document to a clipboard data area. If a different menu entry such as "Paste" is chosen, the
software executes the instructions to copy the text in the clipboard data area to a place in the
document.

Depending on the application, even the example above could become complicated. The field
of software engineering endeavors to manage the complexity of how software operates. This is
especially true for software that operates in the context of a large or powerful computer system.
Kinds of software by operation: computer program as executable, source code or script,
configuration.

Three layers of software

Starting in the 1980s, application software has been sold in mass-produced packages
through retailers

Users often see things differently than programmers. People who use modern general
purpose computers (as opposed to embedded systems, analog computers, supercomputers, etc.)
usually see three layers of software performing a variety of tasks: platform, application, and user
software.

Platform software

Platform includes the basic input-output system (often described as firmware rather than
software), device drivers, an operating system, and typically a graphical user interface
which, in total, allow a user to interact with the computer and its peripherals (associated
equipment). Platform software often comes bundled with the computer, and users may not
realize that it exists or that they have a choice to use different platform software.

Application software

Application software or Applications are what most people think of when they think of
software. Typical examples include office suites and video games. Application software is
often purchased separately from computer hardware. Sometimes applications are bundled
with the computer, but that does not change the fact that they run as independent
applications. Applications are almost always independent programs from the operating
system, though they are often tailored for specific platforms. Most users think of compilers,
databases, and other "system software" as applications.

User-written software

User software tailors systems to meet the users specific needs. User software include
spreadsheet templates, word processor macros, scientific simulations, graphics and
animation scripts. Even email filters are a kind of user software. Users create this software
themselves and often overlook how important it is. Depending on how competently the userwritten
software has been integrated into purchased application packages, many users may
not be aware of the distinction between the purchased packages, and what has been added by
fellow co-workers.

System, programming and application software

Practical computer systems divide software into three major classes: system software,
programming software and application software, although the distinction is somewhat arbitrary, and
often blurred.

System software helps run the computer hardware and computer system. It includes
operating systems, device drivers, diagnostic tools, servers, windowing systems, utilities and more.
The purpose of systems software is to insulate the applications programmer as much as possible
from the details of the particular computer complex being use, especially memory and other
hardware features, and such accessory devices as communications, printers, readers, displays,
keyboards, etc.

Programming software usually provides tools to assist a programmer in writing computer
programs and software using different programming languages in a more convenient way. The tools
include text editors, compilers, interpreters, linkers, debuggers, and so on. An Integrated development environment (IDE) merges those tools into a software bundle, and a programmer may
not need to type multiple commands for compiling, interpreter, debugging, tracing, and etc.,
because the IDE usually has an advanced graphical user interface, or GUI.

Application software allows humans to accomplish one or more specific (non-computer
related) tasks. Typical applications include industrial automation, business software, educational
software, medical software, databases and computer games. Businesses are probably the biggest
users of application software, but almost every field of human activity (from a-bombs to zymurgy)
now uses some form of application software. It is used to automate all sorts of functions. Many
examples may be found at the Business Software Directory.

Software program and library

A software program may not be sufficiently complete for execution by a computer. In
particular, it may require additional software from a software library in order to be complete. Such a
library may include software components used by stand-alone programs, but which cannot be
executed on their own. Thus, programs may include standard routines that are common to many
programs, extracted from these libraries. Libraries may also include 'stand-alone' programs which
are activated by some computer event and/or perform some function (e.g., of computer
'housekeeping') but do not return data to their activating program. Programs may be called by other
programs and/or may call other programs.

Computer software

The various programs by which a computer controls aspects of its operations, such as those
for translating data from one form to another.

Computer software (or simply software) refers to any of the various programs by which a
computer controls aspects of its operations, such as those for translating data from one form to
another, as contrasted with hardware, which is the physical equipment comprising the installation.
The term is roughly synonymous with computer program but is more generic in scope. In most
computers, the moment-to-moment control of the machine resides in a special software program
called an operating system, or supervisor. Other forms of software include assemblers and
compilers for programming languages and applications for business and home use (see computer
program). Software is of great importance; the usefulness of a highly sophisticated array of
hardware can be severely compromised by the lack of adequate software.

The term "software" was first used in this sense by John W. Tukey in 1957. In computer
science and software engineering, computer software is all information processed by computer
systems, programs and data. The concept of reading different sequences of instructions into the
memory of a device to control computations was invented by Charles Babbage as part of his
difference engine. The theory that is the basis for most modern software was first proposed by Alan
Turing in an essay.

Relationship to hardware

Computer software is so called in contrast to computer hardware, which encompasses the
physical interconnections and devices required to store and execute (or run) the software. In
computers, software is loaded into RAM and executed in the central processing unit. At the lowest
level, software consists of a machine language specific to an individual processor. A machine
language consists of groups of binary values (which may be represented by octal or hexadecimal
numerals) signifying processor instructions (object code), which change the state of the computer
from its preceding state. Software is an ordered sequence of instructions for changing the state of
the computer hardware in a particular sequence. It is generally written in high-level programming
languages that are easier and more efficient for humans to use (closer to natural language) than
machine language. High-level languages are compiled or interpreted into machine language object
code. Software may also be written in an assembly language, essentially, a mnemonic
representation of a machine language using a natural language alphabet. Assembly language must
be assembled into object code via an assembler.

Relationship to data

Software has historically been considered an intermediary between electronic hardware and
data, which later the hardware processes according to the sequence of instructions defined by the
software. As computational math becomes increasingly complex, the distinction between software
and data becomes less precise. Data has generally been considered as either the output or input of
executed software. However, data is not the only possible output or input. For example, (system)
configuration information may also be considered input, although not necessarily considered data
(and certainly not applications data). The output of a particular piece of executed software may be
the input for another executed piece of software. Therefore, software may be considered an
interface between hardware, data, and/or (other) software.

CONFIG.SYS

CONFIG.SYS is the primary configuration file for the MS-DOS and OS/2 operating
systems. It is a special file that contains setup or configuration instructions for the computer system.
The commands in this file configure DOS for use with devices and applications in the system. The
commands also set up the memory managers in the system. After processing the CONFIG.SYS file,
DOS proceeds to load and execute the command shell specified in the shell= line of CONFIG.SYS,
or COMMAND.COM if there is no such line. The command shell in turn is responsible for
processing the AUTOEXEC.BAT file.

The system can still boot if these files are missing or corrupted. However, these two files are
essential for the complete bootup process to occur with the DOS operating system. They contain
information that is used to change the operating system for personal use. They also contain the
requirements of different software application packages. A DOS system would require
troubleshooting if either of these files became damaged or corrupted.

CONFIG.SYS is composed mostly of name=value statements which look like variable
assignments. In fact these will either define some tunable parameters often resulting in reservation
of memory, or load files, mostly TSRs and device drivers, into memory.

In DOS, CONFIG.SYS is located in the root directory of the drive from which DOS was
booted. In some versions of DOS it may have an alternate filename, e.g. FDCONFIG.SYS in
FreeDOS, or DCONFIG.SYS in some versions of DR-DOS.

Both CONFIG.SYS and AUTOEXEC.BAT can still be found included in the system files of
the later Microsoft Windows operating systems. Usually these files are empty files, with no content.
OS/2 did not use the autoexec.bat file, instead using startup.cmd.

In the OS/2 subsystem of Windows NT, what appeared as CONFIG.SYS to OS/2 programs
was actually stored in the registry.

Example CONFIG.SYS file for DOS
device = c:\dos\himem.sys
device = c:\dos\emm386.exe umb
dos = high,umb
devicehigh = c:\windows\mouse.sys
devicehigh = c:\dos\setver.exe
devicehigh = c:\dos\smartdrv.exe
country = 044,437,c:\dos\country.sys
shell = c:\dos\command.com c:\dos /e:512 /p

Windows NT

On Windows NT and its derivatives, Windows 2000 and Windows XP, the equivalent file is
called AUTOEXEC.NT and is located in the %SystemRoot%\system32 directory. The file is not
used during the operating system boot process; it is executed when the MS-DOS environment is
started, which occurs when an MS-DOS application is loaded.

The AUTOEXEC.BAT file may often be found on Windows NT, in the root directory of the
boot drive. Windows only considers the "SET" statements which it contains, in order to define
environment variables global to all users. Setting environment variables through this file may be
interesting if for example MS-DOS is also booted from this drive (this requires that the drive be
FAT) or to keep the variables across a reinstall. This is an exotic usage today so this file remains
almost always empty. The TweakUI applet from the PowerToys collection allows to control this
feature (Parse Autoexec.bat at logon).

AUTOEXEC.BAT

File found on the MS-DOS operating system. It is a plain-text batch file that is located in the
root directory of the boot device.

Usage

AUTOEXEC.BAT is only used on MS-DOS or Microsoft Windows versions based on MSDOS,
such as Windows 3.x, Windows 95, Windows 98, and Windows Me. The file is executed
once the operating system has booted and after the CONFIG.SYS file has been processed. On
Windows, this occurs before the graphical environment has been started.

AUTOEXEC.BAT is most often used to set environment variables and run virus scanners,
system enhancements, utilities, and driver handlers that must operate at the lowest level possible.
Applications that run within the Windows environment upon its loading are listed in the Windows
registry.

Lines prefixed with the string "REM" are remarks and are not run as part of
AUTOEXEC.BAT. The "REM" lines are used for comments or to disable drivers (say, for a CDROM).

Floppy disk

A floppy disk is a data storage device that is composed of a ring of thin, flexible (i.e.

"floppy") magnetic storage medium encased in a square or rectangular plastic wallet. Floppy disks

are read and written by a floppy disk drive or FDD, the latter initialism not to be confused with

"fixed disk drive", which is an old IBM term for a hard disk drive.

Floppy disk format Year introduced

Storage capacity

(binary kilobytes if not

stated)

Marketed

capacity¹

8-inch (read-only) 1969 80 ←

8-inch 1972 183.1 1.5 Megabit

8-inch 1973 256 256 KB

8-inch DD 1976 500 0.5 MB

5¼-inch (35 track) 1976 89.6 110 KB

8-inch double sided 1977 1200 1.2 MB

5¼-inch DD 1978 360 360 KB

3½-inch

HP single sided

1982 280 264 KB

3-inch 1982] 360 ←

3½-inch (DD at release) 1984 720 720 KB

5¼-inch QD 1984 1200 1.2 MB

3-inch DD 1984 720 ←

3-inch

Mitsumi Quick Disk

1985 128 to 256 ←

2-inch 1985 720 ←

5¼-inch Perpendicular 1986] 100 MiB ←

3½-inch HD 1987 1440 1.44 MB

3½-inch ED 1991 2880 2.88 MB

3½-inch LS-120 1996 120.375 MiB 120 MB

3½-inch LS-240 1997 240.75 MiB 240 MB

3½-inch HiFD 1998/99 150/200 MiB] 150/200 MB

Acronyms: DD = Double Density; QD = Quad Density; HD = High Density ED = Extended

Density; LS = Laser Servo; HiFD = High capacity Floppy Disk

¹The marketed capacities of floppy disks frequently corresponded only vaguely to their their actual

storage capacities; the 1.44 MB value for the 3½-inch HD floppies is the most widely known

example. See reported storage capacity.

Dates and capacities marked ? are of unclear origin and need source information; other listed

capacities refer to:

• For 8-inch: standard IBM formats as used by the System/370 mainframes and newer

systems

• For 5¼- and 3½-inch: standard PC formats, capacities quoted are the total size of all sectors

on the disk and include space used for the bootsector and filesystem

Other formats may get more or less capacity from the same drives and disks.

File Allocation Table

A partition is divided up into identically sized clusters, small blocks of contiguous space.
Cluster sizes vary depending on the type of FAT file system being used and the size of the partition,
typically cluster sizes lie somewhere between 2 KB and 32 KB. Each file may occupy one or more
of these clusters depending on its size; thus, a file is represented by a chain of these clusters
(referred to as a singly linked list). However these chains are not necessarily stored adjacent to one
another on the disk's surface but are often instead fragmented throughout the Data Region.

The File Allocation Table (FAT) is a list of entries that map to each cluster on the partition.
Each entry records one of five things:

• the address of the next cluster in a chain
• a special end of file (EOF) character that indicates the end of a chain
• a special character to mark a bad cluster
• a special character to mark a reserved cluster
• a zero to note that that cluster is unused

Each version of the FAT file system uses a different size for FAT entries. The size is
indicated by the name, for example the FAT16 file system uses 16 bits for each entry while the
FAT32 file system uses 32 bits. This difference means that the File Allocation Table of a FAT32
system can map a greater number of clusters than FAT16, allowing for larger partition sizes with
FAT32. This also allows for more efficient use of space than FAT16, because on the same hard
drive a FAT32 table can address smaller clusters which means less wasted space.

FAT12 FAT16 FAT32 Description
0x000 0x0000 0x?0000000 Free Cluster
0x001 0x0001 0x?0000001 Reserved Cluster
0x002 -
0xFEF
0x0002 - 0xFFEF
0x?0000002
-
0x?FFFFFEF
Used cluster; value points to next cluster
0xFF0 -
0xFF6
0xFFF0 - 0xFFF6
0x?FFFFFF0
-
Reserved values

0x?FFFFFF6
0xFF7 0xFFF7 0x?FFFFFF7 Bad cluster
0xFF8 -
0xFFF
0xFFF8 - 0xFFFF
0x?FFFFFF8
-
0x?FFFFFFF
Last cluster in file

Note that FAT32 uses only 28 bits of the 32 possible bits. The upper 4 bits are usually zero
but are reserved and should be left untouched. In the table above these are denoted by a question
mark.

The first cluster of the data area is cluster #2. That leaves the first two entries of the FAT
unused. In the first byte of the first entry a copy of the media descriptor is stored. The remaining
bits of this entry are 1. In the second entry the end-of-file marker is stored. The high order two bits
of the second entry are sometimes, in the case of FAT16 and FAT32, used for dirty volume
management: high order bit 1: last shutdown was clean; next highest bit 1: during the previous
mount no disk I/O errors were detected.[6]

Initial FAT16

In 1984 IBM released the PC AT, which featured a 20 MB hard disk. Microsoft introduced
MS-DOS 3.0 in parallel. Cluster addresses were increased to 16-bit, allowing for a greater number
of clusters (up to 65,517) and consequently much greater filesystem sizes. However, the maximum
possible number of sectors and the maximum (partition, rather than disk) size of 32 MB did not
change. Therefore, although technically already "FAT16", this format was not yet what today is
commonly understood under this name. A 20 MB hard disk formatted under MS-DOS 3.0, was not
accessible by the older MS-DOS 2.0. Of course, MS-DOS 3.0 could still access MS-DOS 2.0 style
8 KB cluster partitions.

MS-DOS 3.0 also introduced support for high-density 1.2 MB 5.25" diskettes, which
notably had 15 sectors per track, hence more space for FAT. This probably prompted a dubious
optimization of the cluster size, which went down from 2 sectors to just 1. The net effect was that
high density diskettes were significantly slower than older double density ones.

Extended partition and logical drives

Apart from improving the structure of the FAT filesystem itself, a parallel development
allowing an increase in the maximum possible FAT storage space was the introduction of disk
partitions. PC hard disks can only have up to 4 primary partitions, due to the fixed structure of the
partition table in the master boot record (MBR). However, by design choice DOS would only use
the partition marked as active, which was also the one the MBR would boot. It was not possible to
create multiple primary DOS partitions using DOS tools, and third party tools would warn that such
a scheme would not be compatible with DOS.

To allow the use of more partitions in a compatible way a new partition type was introduced (in
MS-DOS 3.2, January 1986), the extended partition, which was actually just a container for
additional partitions called logical drives. Originally only 1 logical drive was possible, allowing the
use of hard-disks up to 64 MB. In MS-DOS 3.3 (August 1987) this limit was increased to 24 drives;
it probably came from the compulsory letter-based C: - Z: disk naming. The logical drives were
described by on-disk structures which closely resemble MBRs, probably to simplify coding, and
they were chained/nested in a way analogous to Russian matryoshka dolls. Only one extended
partition was allowed.

Prior to the introduction of extended partitions, some hard disk controllers (which at that
time were separate option boards, since the IDE standard did not yet exist) could make large hard
disks appear as two separate disks. Alternatively, special software drivers, like Ontrack's Disk
Manager could be installed for the same purpose.

Final FAT16

Finally in November 1987, in Compaq DOS 3.31, came what is today called the FAT16
format, with the expansion of the 16-bit disk sector index to 32 bits. The result was initially called
the DOS 3.31 Large File System. Although the on-disk changes were apparently minor, the entire
DOS disk code had to be converted to use 32-bit sector numbers, a task complicated by the fact that
it was written in 16-bit assembly language.

In 1988 the improvement became more generally available through MS-DOS 4.0. The limit
on partition size was now dictated by the 8-bit signed count of sectors-per-cluster, which had a
maximum power-of-two value of 64. With the usual hard disk sector size of 512 bytes, this gives 32
KB clusters, hence fixing the "definitive" limit for the FAT16 partition size at 2 gigabytes. On
magneto-optical media, which can have 1 or 2 KB sectors, the limit is proportionally greater.
Much later, Windows NT increased the maximum cluster size to 64 KB by considering the sectorsper-cluster count as unsigned. However, the resulting format was not compatible with any other FAT implementation of the time, and anyway, generated massive internal fragmentation. Windows 98 also supported reading and writing this variant, but its disk utilities didn't work with it.

Long File Names (VFAT, LFNs)

One of the "user experience" goals for the designers of Windows 95 was the ability to use
long file names (LFNs), in addition to classic 8.3 names. LFNs were implemented using a workaround in the way directory entries are laid out (see below). The version of the file system with this extension is usually known as VFAT after the Windows 95 VxD device driver.

Interestingly, the VFAT driver actually appeared before Windows 95, in Windows for
Workgroups 3.11, but was only used for implementing 32-bit File Access, a higher performance
protected mode file access method, bypassing DOS and directly using either the BIOS, or, better,
the Windows-native protected mode disk drivers. It was a backport; Microsoft's ads for WfW 3.11
said 32-bit File Access was based on "the 32-bit file system from our Chicago project".

In Windows NT, support for long file names on FAT started from version 3.5.

FAT32

In order to overcome the volume size limit of FAT16, while still allowing DOS real-mode
code to handle the format without unnecessarily reducing the available conventional memory,
Microsoft decided to implement a newer generation of FAT, known as FAT32, with cluster counts
held in a 32-bit field, of which 28 bits are currently used.

In theory, this should support a total of approximately 268,435,438 (< 228) clusters, allowing
for drive sizes in the range of 2 terabytes. However, due to limitations in Microsoft's scandisk
utility, the FAT is not allowed to grow beyond 4,177,920 (< 222) clusters, placing the volume limit
at 124.55 gigabytes, unless "scandisk" is not needed.

FAT32 was introduced with Windows 95 OSR2, although reformatting was needed to use it,
and DriveSpace 3 (the version that came with Windows 95 OSR2 and Windows 98) never
supported it. Windows 98 introduced a utility to convert existing hard disks from FAT16 to FAT32
without loss of data. In the NT line, support for FAT32 arrived in Windows 2000.

Windows 2000 and Windows XP can read and write to FAT32 filesystems of any size, but
the format program on these platforms can only create FAT32 filesystems up to 32 GB. Thompson
and Thompson (2003) write[4] that "Bizarrely, Microsoft states that this behavior is by design."
Microsoft's knowledge base article 184006[3] indeed confirms the limitation and the by design
statement, but gives no rationale or explanation. Peter Norton's opinion[5] is that "Microsoft has
intentionally crippled the FAT32 file system."

The maximum possible size for a file on a FAT32 volume is 4 GiB minus 1 B (232-1 bytes).
For most users, this has become the most nagging limit of FAT32 as of 2005, since video capture
and editing applications can easily exceed this limit, as can the system swap file.

Third party support

The alternative IBM PC operating systems — such as Linux, FreeBSD, and BeOS — have
all supported FAT, and most added support for VFAT and FAT32 shortly after the corresponding
Windows versions were released. Early Linux distributions also supported a format known as
UMSDOS, which was FAT with Unix file attributes (such as long file name and access
permissions) stored in a separate file called --linux-.---. UMSDOS fell into disuse after VFAT was
released and is not enabled by default in Linux kernels from version 2.5.7 onwards [5]. The Mac OS
X operating system also supports the FAT filesystems on volumes other than the boot disk.

FAT and Alternate Data Streams

The FAT filesystem itself is not designed for supporting ADS, but some operating systems
that heavily depend on them have devised various methods for handling them in FAT drives. Such
methods either store the additional information in extra files and directories (Mac OS), or give new
semantics to previously unused fields of the FAT on-disk data structures (OS/2 and Windows NT).
The second design, while presumably more efficient, prevents any copying or backing-up of those
volumes using non-aware tools; manipulating such volumes using non-aware disk utilities (e.g.
defragmenters or CHKDSK) will probably lose the information.

Mac OS using PC Exchange stores its various dates, file attributes and long filenames in a
hidden file called FINDER.DAT, and Resource Forks (a common Mac OS ADS) in a subdirectory
called RESOURCE.FRK, in every directory where they are used. From PC Exchange 2.1 onwards,
they store the Mac OS long filenames as standard FAT long filenames and convert FAT filenames
longer than 31 characters to unique 31-character filenames, which can then be made visible to
Macintosh applications.

Mac OS X stores metadata (Resource Forks, file attributes, other ADS) in a hidden file with
a name constructed from the owner filename prefixed with "._", and Finder stores some folder and
file metadata in a hidden file called ".DS_Store".

OS/2 heavily depends on extended attributes (EAs) and stores them in a hidden file called
"EA DATA. SF" in the root directory of the FAT12 or FAT16 volume. This file is indexed by 2
previously reserved bytes in the file's (or directory's) directory entry. In the FAT32 format, these
bytes hold the upper 16 bits of the starting cluster number of the file or directory, hence making it
difficult to store EAs on FAT32. Extended attributes are accessible via the Workplace Shell
desktop, through REXX scripts, and many system GUI and command-line utilities (such as 4OS2).
Windows NT supports the handling of extended attributes in HPFS, NTFS, and FAT. It stores EAs
on FAT using exactly the same scheme as OS/2, but does not support any other kind of ADS as
held on NTFS volumes. Trying to copy a file with any ADS other than EAs from an NTFS volume
to a FAT volume gives a warning message with the names of the ADSs that will be lost.

Windows 2000 onward acts exactly as Windows NT, except that it ignores EAs when
copying to FAT32 without any warning (but shows the warning for other ADSs, like "Macintosh
Finder Info" and "Macintosh Resource Fork").

Future

Microsoft has recently secured patents for VFAT and FAT32 (but not the original FAT),
which is causing concern that the company might later seek royalties from Linux distros and from
media vendors that pre-format their products (see FAT Licensing below). Despite two earlier
rulings against them, Microsoft prevailed and was awarded the patents.

Since Microsoft has announced the discontinuation of its MS-DOS-based consumer
operating systems with Windows Me, it remains unlikely that any new versions of FAT will appear.
For most purposes, the NTFS file system that was developed for the Windows NT line is superior to
FAT from the points of view of efficiency, performance and reliability; its main drawbacks are the
size overhead for small volumes and the very limited support by anything other than the NT-based
versions of Windows, since the exact specification is a trade secret of Microsoft, which in turn
makes it difficult to use a DOS floppy for recovery purposes. Microsoft provided a recovery
console to work around this issue, but for security reasons it severely limited what could be done
through the Recovery Console by default.

FAT is still the normal filesystem for removable media (with the exception of CDs and
DVDs), with FAT12 used on floppies, and FAT16 on most other removable media (such as flash
memory cards for digital cameras and USB flash drives). Most removable media is not yet large
enough to benefit from FAT32, although some larger flash drives do make use of it. FAT is used on
these drives for reasons of compatibility and size overhead, as well as the fact that file permissions
on removable media are likely to be more trouble than they are worth.

The FAT32 formatting support in Windows 2000 and XP is limited to drives of 32
gigabytes, which effectively forces users of modern hard drives either to use NTFS or to format the
drive using third party tools such as a port of mkdosfs or fat32format.

Main disk structures

Master
Boot
Record
File
Allocation
Table #1
File
Allocation
Table #2
Root
Directory
All Other Data ... The Rest of the Disk

A FAT file system is composed of four different sections.

1. The Reserved sectors, located at the very beginning. The first reserved sector is the Boot
Sector (aka Partition Boot Record). It includes an area called the BIOS Parameter Block (with some
basic file system information, in particular its type, and pointers to the location of the other
sections) and usually contains the operating system's boot loader code. The total count of reserved
sectors is indicated by a field inside the Boot Sector. Important information from the Boot Sector is
accessible through an operating system structure called the Drive Parameter Block in DOS and
OS/2.

2. The FAT Region. This contains two copies of the File Allocation Table for the sake of
redundancy, although the extra copy is rarely used, even by disk repair utilities. These are maps of
the partition, indicating how the clusters are allocated.

3. The Root Directory Region. This is a Directory Table that stores information about the
files and directories in the root directory. With FAT32 it can be stored anywhere in the partition,
however with earlier versions it is always located immediately after the FAT Region.

4. The Data Region. This is where the actual file and directory data is stored and takes up
most of the partition. The size of files and subdirectories can be increased arbitrarily (as long as
there are free clusters) by simply adding more links to the file's chain in the FAT. Note however,
that each cluster can be taken only by one file, and so if a 1 KB file resides in a 32 KB cluster, 31
KB are wasted.

Boot Sector
The Boot Sector is of the following format:
Byte
Offset
Length
(bytes)
Description
0x00 3 Jump instruction (to skip over header on boot)
0x03 8
OEM Name (padded with spaces). MS-DOS checks this field to determine
which other parts of the boot record can be relied on [6] [7]. Common values
are IBM 3.3 (with two spaces between the "IBM" and the "3.3") and MSDOS5.0.
0x0b 2 Bytes per sector. The BIOS Parameter Block starts here.
0x0d 1 Sectors per cluster
0x0e 2 Reserved sector count (including boot sector)
0x10 1 Number of file allocation tables
0x11 2 Maximum number of root directory entries
0x13 2 Total sectors (if zero, use 4 byte value at offset 0x20)
0x15 1
Media descriptor
0xF8 Single sided, 80 tracks per side, 9 sectors per track
0xF9 Double sided, 80 tracks per side, 9 sectors per track
0xFA Single sided, 80 tracks per side, 8 sectors per track
0xFB Double sided, 80 tracks per side, 8 sectors per track
0xFC Single sided, 40 tracks per side, 9 sectors per track
0xFD Double sided, 40 tracks per side, 9 sectors per track
0xFE Single sided, 40 tracks per side, 8 sectors per track
0xFF Double sided, 40 tracks per side, 8 sectors per track
Same value of media descriptor should be repeated as first byte of each copy of
FAT. Certain operating systems (MSX-DOS version 1.0) ignore boot sector
parameters altogether and use media descriptor value from the first byte of FAT
to determine filesystem parameters.
0x16 2 Sectors per file allocation table (FAT16)
0x18 2 Sectors per track
0x1a 2 Number of heads
0x1c 4 Hidden sectors
0x20 4 Total sectors (if greater than 65535; see offset 0x13)
0x24 4
Sectors per file allocation table (FAT32). The Extended BIOS Parameter Block
starts here.
0x24 1 Physical drive number (FAT16)
0x25 1 Current head (FAT16)
0x26 1 Signature (FAT16)
0x27 4 ID (FAT16)
0x2b 11 Volume Label
0x36 8 FAT file system type (e.g. FAT, FAT12, FAT16, FAT32)
0x3e 448 Operating system boot code
0x1FE 2 End of sector marker (0x55 0xAA)

The boot sector is portrayed here as found on e.g. an OS/2 1.3 boot diskette. Earlier versions used a
shorter BIOS Parameter Block and their boot code would start earlier (for example at offset 0x2b in
OS/2 1.1).

Exceptions

The implementation of FAT used in MS-DOS for the Apricot PC had a different boot sector
layout, to accommodate that computer's non-IBM compatible BIOS. The jump instruction and
OEM name were omitted, and the MS-DOS filesystem parameters (offsets 0x0B - 0x17 in the
standard sector) were located at offset 0x50. Later versions of Apricot MS-DOS gained the ability
to read and write disks with the standard boot sector in addition to those with the Apricot one.

DOS Plus on the BBC Master 512 did not use conventional boot sectors at all. Data disks
omitted the boot sector and began with a single copy of the FAT (the first byte of the FAT was used
to determine disk capacity) while boot disks began with a miniature ADFS filesystem containing
the boot loader, followed by a single FAT. It could also access standard PC disks formatted to 180
KB or 360 KB, again using the first byte of the FAT to determine capacity.

computer languages

Pages