Return to BSD News archive
Newsgroups: comp.os.386bsd.questions
Path: sserve!newshost.anu.edu.au!munnari.oz.au!metro!dmssyd.syd.dms.CSIRO.AU!megadata!andrew
From: andrew@megadata.mega.com.au (Andrew McRae)
Subject: Re: Ethernet [really TCP] performance measurement
Message-ID: <1993Jun22.235220.20247@mega.com.au>
Originator: andrew@noah
Sender: news@mega.com.au
Organization: Megadata P/L, North Ryde, Sydney, Aust.
References: <1993Jun14.234422.22663@uvm.edu>
Date: Tue, 22 Jun 1993 23:52:20 GMT
Lines: 177
To: wollman@sadye.emba.uvm.edu
Subject: Re: Ethernet [really TCP] performance measurement
wollman@sadye.emba.uvm.edu (Garrett Wollman):
> In article <C8L7A8.1v1@implode.rain.com> davidg@implode.rain.com (David Greenman) writes:
> When I ever get around to it, I may consider doing some kernel
> profiling to see what's slowing things up. In the mean time, here's
> my `localhost' results for both send and receive, using the 8k buffer
> size, while I wait until I have enough money for a nice 486/33...
>
> -GAWollman
I gave a paper on this at Usenix in January, which described
a hardware profiling method. I used 386BSD as the target system, and
one of the case studies dealt with network performance (TCP to be
be precise). I encourage anyone who wants some hard numbers to look
at this paper. In essence, whilst the ISA bus is a major bottleneck
for the Ethernet card, other bottlenecks appear such as the IP
checksumming code and lots of other little things, such as poor
interrupt latency due to high spl* overhead. Cranking up the
CPU speed helps a little, cranking up (or removing) the ISA bus
overhead helps, but what would help most is redoing some of the
code paths.
I have reproduced some results below. Elapsed is the wall time
total for each routine, Net is the net time within the routine (ie.
not including subroutines), max/avg/min is the number of
microseconds for each call, and the percentages expresses the
proportion this routine uses of the total time (% real is the
proportion of wall time, % net the proportion of time when the
CPU was actually busy).
Elapsed Net # calls (max/avg/min) % real % net name
--------------------------------------------------------------
166394 165519 930 (1089/177/2) 5.75% 31.44% bcopy
152382 151700 514 (901/295/23) 5.27% 28.81% in_cksum
29401 29401 2762 (13/10/8) 1.02% 5.58% splnet
442031 16391 166 (125/98/87) 0.57% 3.11% soreceive
15421 15421 710 (24/21/21) 0.54% 2.93% spl0
10948 10898 3066 (19/3/3) 0.38% 2.07% splx
16069 9855 433 (36/22/18) 0.34% 1.87% malloc
202651 9132 86 (193/106/28) 0.32% 1.73% werint
419514 8997 377 (40/23/18) 0.31% 1.71% ISAINTR
183830 7989 170 (98/46/18) 0.28% 1.52% weget
13646 7576 423 (23/17/15) 0.26% 1.44% free
19467 7189 218 (78/32/12) 0.25% 1.37% westart
213256 6270 89 (96/70/48) 0.22% 1.19% weintr
6107 6107 596 (12/10/9) 0.21% 1.16% splhigh
6081 6081 334 (42/18/3) 0.21% 1.15% copyout
153917 6034 170 (115/35/32) 0.21% 1.15% tcp_input
44937 5962 171 (78/34/15) 0.21% 1.13% tcp_output
166841 5810 44 (155/132/55) 0.20% 1.10% ipintr
7324 4075 170 (35/23/19) 0.14% 0.77% ether_input
450706 3640 167 (25/21/18) 0.13% 0.69% syscall
48918 3399 168 (25/20/17) 0.12% 0.65% tcp_usrreq
9451 3336 332 (25/10/8) 0.12% 0.63% uiomove
25567 3170 87 (43/36/33) 0.11% 0.60% ether_output
A typical code trace of a TCP packet being received is show below (all
times in microseconds) - I have highlighted interesting numbers.
Note how many splnet, splx, splhigh routines get called, and how long
each one takes. (All numbers achieved using 386/33 with 64Kb cache).
2:394 270 -> ISAINTR (26 us, 9057 total)
2:394 278 -> weintr (90 us, 4943 total)
2:394 311 -> werint (187 us, 4768 total)
2:394 348 -> weread (14 us, 1332 total)
2:394 359 -> weget (54 us, 1275 total)
2:394 365 -> malloc (27 us, 42 total)
2:394 373 -> splnet (11 us)
2:394 394 -> splx (4 us)
2:394 407 <-
2:394 413 -> min (3 us)
2:394 418 -> splnet (11 us)
2:394 437 -> splx (5 us)
2:394 452 -> min (3 us)
2:394 458 -> bcopy (1073 us) ***************
2:395 535 -> malloc (22 us, 37 total)
2:395 539 -> splnet (11 us)
2:395 559 -> splx (4 us)
2:395 572 <-
2:395 578 -> min (1 us)
2:395 583 -> bcopy (46 us)
2:395 634 <-
2:395 636 -> ether_input (25 us, 43 total)
2:395 640 -> bcmp (4 us)
2:395 653 -> splnet (10 us)
2:395 666 -> splx (4 us)
2:395 679 <-
2:395 680 <-
Three more weread() traces like above deleted..
2:399 079 <-
2:399 080 -> westart (16 us, 28 total)
2:399 083 -> splhigh (9 us)
2:399 096 -> splx (3 us)
2:399 108 <-
2:399 141 -> werint (30 us)
2:399 172 -> westart (14 us, 27 total)
2:399 174 -> splhigh (10 us)
2:399 187 -> splx (3 us)
2:399 199 <-
2:399 221 <-
2:399 233 -> splnet (10 us)
2:399 246 -> ipintr (137 us, 4056 total)
2:399 248 -> splnet (10 us)
2:399 262 -> splx (3 us)
2:399 280 -> in_cksum (23 us)
2:399 322 -> tcp_input (53 us, 1062 total)
2:399 333 -> in_cksum (901 us, 941 total)
2:399 575 -> ISAINTR (20 us, 40 total)
2:399 584 -> hardclock (9 us, 20 total)
2:399 589 -> gatherstats (11 us)
2:399 604 <-
2:399 615 <-
2:400 274 <-
2:400 304 -> imin (2 us)
2:400 312 -> sbappend (7 us, 22 total)
2:400 317 -> sbcompress (15 us)
2:400 334 <-
2:400 337 -> sowakeup (9 us, 44 total)
2:400 342 -> wakeup (19 us, 35 total)
2:400 344 -> splhigh (10 us)
2:400 360 -> setrq (3 us)
2:400 366 -> splx (3 us)
2:400 377 <-
2:400 381 <-
2:400 384 <-
2:400 386 -> splnet (11 us)
2:400 400 -> splx (3 us)
2:400 416 -> in_cksum (25 us)
2:400 452 -> tcp_input (35 us, 900 total)
2:400 459 -> in_cksum (843 us) ***************************
2:401 322 -> imin (1 us)
2:401 327 -> sbappend (5 us, 17 total)
2:401 331 -> sbcompress (12 us)
2:401 344 <-
2:401 346 -> sowakeup (4 us)
2:401 352 <-
2:401 354 -> splnet (11 us)
2:401 367 -> splx (4 us)
2:401 383 -> in_cksum (23 us)
2:401 417 -> tcp_input (33 us, 899 total)
2:401 423 -> in_cksum (843 us)
2:402 286 -> imin (1 us)
2:402 290 -> sbappend (7 us, 18 total)
2:402 296 -> sbcompress (11 us)
2:402 308 <-
2:402 310 -> sowakeup (4 us)
2:402 316 <-
2:402 318 -> splnet (11 us)
2:402 331 -> splx (4 us)
2:402 346 -> in_cksum (23 us)
2:402 381 -> tcp_input (33 us, 893 total)
2:402 387 -> in_cksum (835 us)
2:403 241 -> imin (1 us)
2:403 246 -> sbappend (9 us, 20 total)
2:403 254 -> sbcompress (11 us)
2:403 266 <-
2:403 268 -> sowakeup (4 us)
2:403 274 <-
2:403 276 -> splnet (11 us)
2:403 289 -> splx (3 us)
2:403 302 <-
2:403 303 -> spl0 (22 us)
2:403 327 <-
2:403 333 -> spl0 (23 us)
In the hope that some hard data is useful...
Cheers,
Andrew McRae inet: andrew@mega.com.au
Megadata Pty Ltd, uucp: ..!uunet!mega.com.au!andrew
North Ryde 2113 Phone: +61 2 805 0899
NSW AUSTRALIA Fax: +61 2 887 4847