Return to BSD News archive
Path: sserve!manuel.anu.edu.au!munnari.oz.au!news.hawaii.edu!ames!agate!toe.CS.Berkeley.EDU!bostic From: bostic@toe.CS.Berkeley.EDU (Keith Bostic) Newsgroups: comp.unix.bsd Subject: Re: Largest file size for 386BSD ? Date: 9 Nov 1992 18:58:25 GMT Organization: University of California, Berkeley Lines: 128 Message-ID: <1dmcchINNt54@agate.berkeley.edu> References: <1992Nov6.031757.20766@ntuix.ntu.ac.sg> <1992Nov6.173454.17896@fcom.cc.utah.edu> NNTP-Posting-Host: toe.cs.berkeley.edu There are four issues for file size in a UNIX-like system: 1: the off_t type, the file "offset" measured in bytes 2: the logical block type, measured in X block units 3: the physical block type, measured in Y block units 4: the number of meta-data blocks you can access The off_t is the value returned by lseek, and in all BSD systems with the exception of 4.4BSD, it's a 32-bit signed quantity. In 4.4BSD, it's a 64-bit signed quantity. (As a side-note, this change broke every application on the system. The two big issues were programs that depended on fseek and lseek returning similar values, and programs that explicitly casted lseek values to longs.) The 32-bit off_t limit means that files cannot grow to be more than 2G in size, the 64-bit limit means that you don't have to worry about it 'cause the next three limits are going to kick in. So, the bottom line for this limit is 2^off_t - 1, because a single out-of-band value, -1, is used to denote an error. The second limit is the logical block type, and in a BSD system is a daddr_t, a signed 32-bit quantity. The logical block type is the number of logical blocks that a file may have. The reason that this has to be a signed quantity is that the "name space" for logical blocks is split into two parts, the data blocks and the meta-data blocks. Before 4.4BSD, the FFS used physical addresses for meta-data, so that this division wasn't necessary. However, this implies that you know the disk address of a block at all times. In a log-structured file system, since you don't know the address until you actually write the block (for lots of reasons), the "logical" name space has to be divided up. In the 4BSD LFS (and the 4.4BSD FFS and the Sprite LFS) the logical name space is split by the top bit, i.e. "negative" block numbers are meta-data blocks. So, the bottom line for this limit is 2^31 logical blocks in a file. The third limit is the physical block type. In UNIX-like systems, the physical block is also a daddr_t. In the FFS, it's the fragment size, and the FFS addresses the disks in units of fragments, i.e. an 8K block 1K fragment file system will address the disks in 1K units. This limits the size of the physical device. The fourth limit is the number of data blocks that are accessible through triple-indirect addressing. In 4BSD there are 12 (NDADDR) direct blocks and 3 (NIADDR) levels of indirection, for a total of: NDADDR + NINDIR(blocksize) + NINDIR(blocksize)^2 + NINDIR(blocksize)^3 data blocks. Given 64-bit off_t's, and 32-bit daddr_t's, this all boils down to: Block size # of data blocks Max file size Limiting type .5K 2113676 ~ 1G 4 1K 16843020 ~ 16G 4 2K 134480396 ~262G 4 4K 1074791436 ~ 4T 4 8K 2147483648 ~16T 2 16K 2147483648 ~32T 2 Note 1: For 32-bit off_t's, the maximum file size is 2G, except for 512 byte block file systems where it's 1G. The limiting type for all of these is #1, except for 512 byte block file systems where it's #4. Note 2: If we go to 64-bit daddr_t's, the branching factor goes DOWN, because you need 8-bytes in the indirect block for each physical block. The table then becomes: Block size # of data blocks Max file size Limiting type .5K 266316 ~130M 4 1K 2113676 ~ 2G 4 2K 16843020 ~ 32G 4 4K 134480396 ~512G 4 8K 1074791436 ~ 8T 4 16K 8594130956 ~128T 4 >In article <1992Nov6.031757.20766@ntuix.ntu.ac.sg> eoahmad@ntuix.ntu.ac.sg (Othman Ahmad) writes: >>This will be an important issue because soon we'll have hundreds of gigabytes, >>instead of magabytes soon. >> It took the jump from tens mega to hundreds in just 10 years. There are two issues that you need to consider. The first is the actual physical data that you have, which can probably be satisfied, in 99.99 percent of the cases, by 2G, let alone 16T. The latter figure is also probably fine given what we can physically store on both magnetic and tertiary storage. While it is true that big files are getting bigger (by roughly an order of magnitude), most files are about the same size they were ten years ago, i.e 40% are under 1K and 80% are under 20K [SOSP '91, Mary Baker, Measurements of a Distributed File System]. Even that order of magnitude isn't all that interesting for this case, as most files simple aren't larger than 16T. The second issue is the addressibility of the data. Some applications want to store large objects (measured in megabytes) in a huge sparse file. These applications may have a 2G disk, but want files sized in terabytes. There is no satisfactory answre on most current UNIX systems, but the 64-bit daddr_t's would seem to make the situation better. In article <1992Nov6.173454.17896@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes: >Get around the problem: > >1) Multiple partitions not exceeding the 4 Gig limit. >2) Larger terminal blocks. >3) Additional indirection levels. >4) Assumption of larger files = log-structure file systems (ala Sprite). The interesting point for me is #4 -- although I'm not real sure what you meant. The advantages of LFS are two-fold. First, the features that theoretically would be available to applications, due to its no-overwrite policy, are attractive, e.g. "unrm", versioning, transactions. Second, with multiple writers it has the potential for improved performance. It is becoming clearer, at least to me, that the LFS performance advantages are not as obvious as they originally appeared, mostly because of the strong effects of the cleaner. I'm starting to agree with Larry McVoy of [USENIX, January 1991, Extent-like Performance from a UNIX File System] that FFS with read/write clustering is just as fast as LFS in many circumstances, and faster in lots of large-file applications where the disk is over, say, 80% utilized. Keith Bostic