Sunday, August 28, 2011

File management:


A file system (filesystem) is means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device(s) which contain it.


Without a filesystem programs would not be able to access data by file name or directory and would need to be able to directly access data regions on a storage device.



File systems are used on data storage devices such as magnetic storage disks or optical discs to maintain the physical location of the computer files. They may provide access to data on a file server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients), or they may be virtual and exist only as an access method for virtual data (e.g., procfs). This is distinguished from a directory service and registry.


File names
Files are an abstraction mechanism.  They provide a way to store information on the disk and read it back later.
This must be done in such a way as to shield the user from the details of how and where the information is stored, and how the disks actually work.
Filenaming- the most important characteristic of any abstraction mechanism is the way the objects being managed are named.
·         some file systems support names as long as 255 characters
·         some file systems distinguish between upper and lower case letters, whereas others do not
·         many OS support two-part file names, with the two-parts separated by a period, called File extension and usually indicate something about the file
example:     In MS-DOS , file names are up to 1 to 8 characters
In UNIX, the size of the extension and a file may even have two  or more extensions  (prog.c.Z)
Extension
Meaning
file.bak
Backup file
file.c
C source program
file.gif
Compuserve Graphical Interchange Format image
file.hlp
Help file
World Wide Web HyperText Markup Language document
Still picture encoded with the JPEG standard
file.html
file.jpg
file.mp3
Music encoded in MPEG layer 3 audio format
file.mpg
Movie encoded with the MPEG standard
file.o
Object file (compiler output, not yet linked)
Portable Document Format file
file.pdf
file.ps
PostScript file
file.tex
Input for the TEX formatting program
file.txt
file.zip
General text file
Compressed archive

File Operations
Files exist to store information and allow it to be retrieved later. Different systems provide different operations to allow storage and retrieval. Below is a discussion of the most common system calls relating to files.

1.    Create. The file is created with no data. The purpose of the call is to announce that the file is coming and to set some of the attributes.

2.    Delete. When the file is no longer needed, it has to be deleted to free up disk space. There is always a system call for this purpose.

3.    Open. Before using a file, a process must open it. The purpose of the open call is to allow the system to fetch the attributes and list of disk addresses into main memory for rapid access on later calls.

4.    Close. When all the accesses are finished, the attributes and disk addresses are no longer needed, so the file should be closed to free up internal table space. Many systems encourage this by imposing a maximum number of open files on processes. A disk is written in blocks, and closing a file forces writing of the file's last block, even though that block may not be entirely full yet.

5.    Read. Data are read from file. Usually, the bytes come from the current position. The caller must specify how much data are needed and must also provide a buffer to put them in.

6.    Write. Data are written to the file, again, usually at the current posi­tion. If the current position is the end of the file, the file's size increases. If the current position is in the middle of the file, existing data are overwritten and lost forever.

7.    Append. This call is a restricted form of write. It can only add data to .to the end of the file. Systems that provide a minimal set of system calls do not generally have append, but many systems provide multi­ple ways of doing the same thing, and these systems sometimes have append.

8.    Seek. For random access files, a method is needed to specify from where to take the data. One common approach is a system call. seek, that repositions the file pointer to a specific place in the file. After this call has completed, data can be read from, or written to, that position.

9.    Get attributes. Processes often need to read file attributes to do their work. For example, the UNIX make program is commonly used to manage software development projects consisting of many source files. When make is called, it examines the modification times of all the source and object files and arranges for the minimum number of compilations required to bring everything up to date. To do its job, it must look at the attributes, namely, the modification times.  
            
10.  Set attributes. Some of the attributes are user settable and can be changed after the file has been created. This system call makes that possible. The protection mode information is an obvious example. Most of the flags also fall in this category.

11.  Rename. It frequently happens that a user needs to change the name of an existing file. This system call makes that possible. It is not always strictly necessary, because the file can usually be copied to a new file with the new name, and the old file then deleted.

 

Directories

File systems typically have directories (sometimes called folders) which allow the user to group files. This may be implemented by connecting the file name to an index in a table of contents or an inode in a Unix-like file system. Directory structures may be flat (i.e. linear), or allow hierarchies where directories may contain subdirectories.

The first file system to support arbitrary hierarchies of directories was the file system in the Multics operating system.

The native file systems of Unix-like systems also support arbitrary directory hierarchies, as do, for example, Apple's Hierarchical File System and its successor HFS+ in classic Mac OS (HFS+ is still used in Mac OS X), the FAT file system in MS-DOS 2.0 and later and Microsoft Windows, the NTFS file system in the Windows NT family of operating systems, and the ODS-2 and higher levels of the Files-11 file system in OpenVMS.

Directory Operations

The allowed system calls for managing directories exhibit more variation: from system to system than system calls for files. To give an impression of they are and how they work, we will give a sample (taken from UNIX).
1.    Create. A directory is created. It is empty except for dot and dotdot. which are put there automatically by the system (or in a few cases. by the mkdir program).

2.    Delete. A directory is deleted. Only an empty directory can be deleted. A directory containing only dot and dotdot is considered empty as these cannot usually be deleted.

3.    Opendir. Directories can be read. For example, to list all the files in a directory, a listing program opens the directory to read out the names of all the files it contains. Before a directory can be read, it must be opened, analogous to opening and reading a file.

4.    Closedir. When a directory has been read, it should be closed to free up internal table space.

5.    Readdir. This call returns the next entry in an open directory. Form­erly, it was possible to read directories using the usual read system call, but that approach has the disadvantage of forcing the program­mer to know and deal with the internal structure of directories. In contrast, readdir always returns one entry in a standard format, no matter which of the possible directory structures is being used.

6.    Rename. In many respects, directories are just like files and can renamed the same way files can be.

7.    Link. Linking is a technique that allows a file to appear in more than one directory. This system call specifies an existing file and a p - name, and creates a link from the existing file to the name specified by the path. In this way, the same file may appear in multiple directories. A link of this kind, which increments the counter in the file's i-node (to keep track of the number of directory entries containing: the file), is sometimes called a hard link.

8.    Unlink. A directory entry is removed. If the file being unlinked is only present in one directory (the normal case), it is removed from the file system. If it is present in multiple directories, only the path name specified is removed. The others remain. In UNIX, the system call for deleting files is in fact, unlink.

 

Metadata


Other bookkeeping information is typically associated with each file within a file system. The length of the data contained in a file may be stored as the number of blocks allocated for the file or as a byte count. The time that the file was last modified may be stored as the file's timestamp. File systems might store the file creation time, the time it was last accessed, the time the file's meta-data was changed, or the time the file was last backed up. Other information can include the file's device type (e.g., block, character, socket, subdirectory, etc.), its owner user ID and group ID, and its access permission settings (e.g., whether the file is read-only, executable, etc.)

Utilities

File systems include utilities to initialize, alter parameters of and remove an instance of the filesystem.

Some include the ability to extend or truncate the space allocated to the file system.

  • · Directory utilities create, rename and delete directory entries and alter metadata associated with a directory. They may include a means to create additional links to a directory (hard links in Unix), rename parent links (".." in Unix-like OS), and create bidirectional links to files.
  • · File utilities create, list, copy, move and delete files, alter metadata. They may be able to truncate data, truncate or extend space allocation, append to, move, and modify files in-place. 
  • · Also in this category are utilities to free space for deleted files if the filesystem provides an undelete function.
  • · defragmentation utility provided utilities by the file system that defer reorganization of free space, secure erasing of free space and rebuilding of hierarchical structures.

Restricting and permitting access

There are several mechanisms used by file systems to control access to data.

  • Usually the intent is to prevent reading or modifying files by a user or group of users.
  • Another reason is to insure data is modified in a controlled way so access may be restricted to a specific to program. 
  • Examples include
  1.  passwords stored in the metadata of the file or elsewhere and 
  2.  file permissions in the form of permission bits, access control lists, or capabilities. The need for filesystem utilities to be able to access the data at the media level to reorganize the structures and provide efficient backup usually means that these are only effective for polite users but are not effective against intruders.
  3. Methods for encrypting file data are sometimes included in the filesystem. This is very effective since there is no need for filesystem utilities to know the encryption seed to effectively manage the data. The risks of relying on encryption include the fact that an attacker can copy the data and use brute force to decrypt the data. Losing the seed means losing the data.

Maintaining integrity

One of the filesystems significant responsibilities is to insure that, regardless of the actions by programs accessing the data, the structure remains consistent.

  • · This includes actions taken if a program modifying data terminates abnormally or neglects to inform the filesystem that is has completed its activities.
  • · This may include updating the metadata, the directory entry and handling any data that was buffered but not yet updated on the physical storage media.
  • · Other failures which the filesystem must deal with include media failures or loss of connection to remote systems.
  • · In the event of an operating system failure or "soft" power failure, special routines in the filesystem must be invoked similar to when an individual program fails.
  • · The filesystem must also be able to correct damaged structures. These may occur as a result of an operating system failure for which the OS was unable to notify the file system, power failure or reset.
  • · The filesystem must also record events to allow analysis of systemic issues as well as problems with specific files or directories.

Types of file systems

File system types can be classified into disk/tape file systems, network file systems and special purpose file systems.

  • Disk file systems- In computing, disk file systems are file systems which manage data on permanent storage devices. As magnetic disks are the most common of such devices, most disk file systemsare designed to perform well in spite of the seek latencies inherent in such media.
·         Examples include FAT (FAT12, FAT16, FAT32, exFAT),NTFS, HFS and HFS+, HPFS, UFS, ext2, ext3, ext4, btrfs, ISO 9660, Files-11, Veritas File System, VMFS, ZFS, ReiserFS and UDF. Some disk file systems arejournaling file systems or versioning file systems.

  • Optical discs
ISO 9660 and Universal Disk Format (UDF) are two common formats that target Compact Discs, DVDs and Blu-ray discs. Mount Rainier is an extension to UDF supported by Linux 2.6 series and Windows Vista that facilitates rewriting to DVDs.

  • Flash file systems
A flash file system considers the special abilities, performance and restrictions of flash memory devices. Frequently a disk file system can use a flash memory device as the underlying storage media but it is much better to use a filesystem specifically designed for a flash device.

  • Tape file systems
A tape file system is a file system and tape format designed to store files on tape in a self-describing form. Magnetic tapes are sequential storage media with significantly longer random data access times than disks, posing challenges to the creation and efficient management of a general-purpose file system.

  • Tape formatting
Writing data to a tape is often a significantly time-consuming process that may take several hours. Similarly, completely erasing or formatting a tape can also take several hours. With many data tape technologies it is not necessary to format the tape before over-writing new data to the tape. This is due to the inherently destructive nature of overwriting data on sequential media.

Because of the time it can take to format a tape, typically tapes are pre-formatted so that the tape user does not need to spend time preparing each new tape for use. All that is usually necessary is to write an identifying media label to the tape before use, and even this can be automatically written by software when a new tape is used for the first time.

  • Database file systems
Another concept for file management is the idea of a database-based file system. Instead of, or in addition to, hierarchical structured management, files are identified by their characteristics, like type of file, topic, author, or similar rich metadata. [1]

A lot of Web-CMS use a relational DBMS to store and retrieve files. Examples: XHTML files are stored as XML or text fields, image files are stored as blob fields; SQL SELECT (with optional XPath) statements retrieve the files, and allow the use of a sophisticated logic and more rich information associations than "usual file systems".

Very large file systems, embodied by applications like Apache Hadoop and Google File System, use some database file system concepts.


  • Transactional file systems
Transactional NTFS allows for files and directories to be modified, created, renamed, and deleted atomically. Using transaction ensures correctness of operation; in a series of fileoperations (done as a transaction), the operation will be committed if all the operations succeed. In case of any failure, the entire operation will rollback and fail.

  • Network file systems
A network file system is a file system that acts as a client for a remote file access protocol, providing access to files on a server. Examples of network file systems include clients for the NFS, AFS, SMB protocols, and file-system-like clients for FTP and WebDAV.
  • Shared disk file systems
A shared disk file system is one in which a number of machines (usually servers) all have access to the same external disk subsystem (usually a SAN). The file system arbitrates access to that subsystem, preventing write collisions. Examples include GFS from Red Hat, GPFS from IBM, and SFS from DataPlow.
  • Special file systems
A special file system presents non-file elements of an operating system as files so they can be acted on using file system APIs. This is most commonly done inUnix-like operating systems, but devices are given file names in some non-Unix-like operating systems as well.
  • Device file systems
A device file system represents I/O devices and pseudo-devices as files, called device files. Examples in Unix-like systems include devfs and, in Linux 2.6 systems, udev. In non-Unix-like systems, such as TOPS-10 and other operating systems influenced by it, where the full filename or pathname of a file can include a device prefix, devices other than those containing file systems are referred to by a device prefix specifying the device, without anything following it.
  • Others
In the Linux kernel, configfs and sysfs provide files that can be used to query the kernel for information and configure entities in the kernel.
procfs maps processes and, on Linux, other operating system structures into a filespace.

Microsoft Windows



Directory listing in a Windows command shell


Windows makes use of the FAT and NTFS file systems.

Windows uses a drive letter abstraction at the user level to distinguish one disk or partition from another. For example, the path C:\WINDOWS represents a directory WINDOWS on the partition represented by the letter C. The C drive is most commonly used for the primary hard disk partition, on which Windows is usually installed and from which it boots. This "tradition" has become so firmly ingrained that bugs came about in older applications which made assumptions that the drive that the operating system was installed on was C. The use of drive letters, and the tradition of using "C" as the drive letter for the primary hard disk partition, can be traced to MS-DOS, where the letters A and B were reserved for up to two floppy disk drives. This in turn derived from CP/M in the 1970s, and ultimately from IBM's CP/CMS of 1967.

Network drives may also be mapped to drive letters.


FAT

The File Allocation Table (FAT) filing system, supported by all versions of Microsoft Windows, was an evolution of that used in Microsoft's earlier operating system (MS-DOS which in turn was based on 86-DOS). FAT ultimately traces its roots back to the short-lived M-DOS project and Standalone disk BASICbefore it. Over the years various features have been added to it, inspired by similar features found on file systems used by operating systems such as Unix.


Older versions of the FAT file system (FAT12 and FAT16) had file name length limits, a limit on the number of entries in the root directory of the file system and had restrictions on the maximum size of FAT-formatted disks or partitions. Specifically, FAT12 and FAT16 had a limit of 8 characters for the file name, and 3 characters for the extension (such as .exe). This is commonly referred to as the 8.3 filename limit. VFAT, which was an extension to FAT12 and FAT16 introduced in Windows NT 3.5 and subsequently included in Windows 95, allowed long file names (LFN).


FAT32 also addressed many of the limits in FAT12 and FAT16, but remains limited compared to NTFS.


exFAT (also known as FAT64) is the newest iteration of FAT, with certain advantages over NTFS with regards to file system overhead. exFAT is only compatible with newer Windows systems, such as Windows 2003, Windows Vista, Windows 2008, Windows 7 and more recently, support has been added for WinXP.[8]
NTFS


NTFS, introduced with the Windows NT operating system, allowed ACL-based permission control. Hard links, multiple file streams, attribute indexing, quota tracking, sparse files, encryption, compression, reparse points (directories working as mount-points for other file systems, symlinks, junctions, remote storage links) are also supported, though not all these features are well-documented.[citation needed]



Networking

IP Addressing So what’s a TCP/IP Address? TCP/IP address works on the Network and Transport layer of the OSI-ISO reference model and Intern...