Return to BSD News archive
Newsgroups: comp.os.386bsd.questions Path: sserve!newshost.anu.edu.au!munnari.oz.au!metro!dmssyd.syd.dms.CSIRO.AU!megadata!andrew From: andrew@megadata.mega.com.au (Andrew McRae) Subject: Re: Ethernet [really TCP] performance measurement Message-ID: <1993Jun22.235220.20247@mega.com.au> Originator: andrew@noah Sender: news@mega.com.au Organization: Megadata P/L, North Ryde, Sydney, Aust. References: <1993Jun14.234422.22663@uvm.edu> Date: Tue, 22 Jun 1993 23:52:20 GMT Lines: 177 To: wollman@sadye.emba.uvm.edu Subject: Re: Ethernet [really TCP] performance measurement wollman@sadye.emba.uvm.edu (Garrett Wollman): > In article <C8L7A8.1v1@implode.rain.com> davidg@implode.rain.com (David Greenman) writes: > When I ever get around to it, I may consider doing some kernel > profiling to see what's slowing things up. In the mean time, here's > my `localhost' results for both send and receive, using the 8k buffer > size, while I wait until I have enough money for a nice 486/33... > > -GAWollman I gave a paper on this at Usenix in January, which described a hardware profiling method. I used 386BSD as the target system, and one of the case studies dealt with network performance (TCP to be be precise). I encourage anyone who wants some hard numbers to look at this paper. In essence, whilst the ISA bus is a major bottleneck for the Ethernet card, other bottlenecks appear such as the IP checksumming code and lots of other little things, such as poor interrupt latency due to high spl* overhead. Cranking up the CPU speed helps a little, cranking up (or removing) the ISA bus overhead helps, but what would help most is redoing some of the code paths. I have reproduced some results below. Elapsed is the wall time total for each routine, Net is the net time within the routine (ie. not including subroutines), max/avg/min is the number of microseconds for each call, and the percentages expresses the proportion this routine uses of the total time (% real is the proportion of wall time, % net the proportion of time when the CPU was actually busy). Elapsed Net # calls (max/avg/min) % real % net name -------------------------------------------------------------- 166394 165519 930 (1089/177/2) 5.75% 31.44% bcopy 152382 151700 514 (901/295/23) 5.27% 28.81% in_cksum 29401 29401 2762 (13/10/8) 1.02% 5.58% splnet 442031 16391 166 (125/98/87) 0.57% 3.11% soreceive 15421 15421 710 (24/21/21) 0.54% 2.93% spl0 10948 10898 3066 (19/3/3) 0.38% 2.07% splx 16069 9855 433 (36/22/18) 0.34% 1.87% malloc 202651 9132 86 (193/106/28) 0.32% 1.73% werint 419514 8997 377 (40/23/18) 0.31% 1.71% ISAINTR 183830 7989 170 (98/46/18) 0.28% 1.52% weget 13646 7576 423 (23/17/15) 0.26% 1.44% free 19467 7189 218 (78/32/12) 0.25% 1.37% westart 213256 6270 89 (96/70/48) 0.22% 1.19% weintr 6107 6107 596 (12/10/9) 0.21% 1.16% splhigh 6081 6081 334 (42/18/3) 0.21% 1.15% copyout 153917 6034 170 (115/35/32) 0.21% 1.15% tcp_input 44937 5962 171 (78/34/15) 0.21% 1.13% tcp_output 166841 5810 44 (155/132/55) 0.20% 1.10% ipintr 7324 4075 170 (35/23/19) 0.14% 0.77% ether_input 450706 3640 167 (25/21/18) 0.13% 0.69% syscall 48918 3399 168 (25/20/17) 0.12% 0.65% tcp_usrreq 9451 3336 332 (25/10/8) 0.12% 0.63% uiomove 25567 3170 87 (43/36/33) 0.11% 0.60% ether_output A typical code trace of a TCP packet being received is show below (all times in microseconds) - I have highlighted interesting numbers. Note how many splnet, splx, splhigh routines get called, and how long each one takes. (All numbers achieved using 386/33 with 64Kb cache). 2:394 270 -> ISAINTR (26 us, 9057 total) 2:394 278 -> weintr (90 us, 4943 total) 2:394 311 -> werint (187 us, 4768 total) 2:394 348 -> weread (14 us, 1332 total) 2:394 359 -> weget (54 us, 1275 total) 2:394 365 -> malloc (27 us, 42 total) 2:394 373 -> splnet (11 us) 2:394 394 -> splx (4 us) 2:394 407 <- 2:394 413 -> min (3 us) 2:394 418 -> splnet (11 us) 2:394 437 -> splx (5 us) 2:394 452 -> min (3 us) 2:394 458 -> bcopy (1073 us) *************** 2:395 535 -> malloc (22 us, 37 total) 2:395 539 -> splnet (11 us) 2:395 559 -> splx (4 us) 2:395 572 <- 2:395 578 -> min (1 us) 2:395 583 -> bcopy (46 us) 2:395 634 <- 2:395 636 -> ether_input (25 us, 43 total) 2:395 640 -> bcmp (4 us) 2:395 653 -> splnet (10 us) 2:395 666 -> splx (4 us) 2:395 679 <- 2:395 680 <- Three more weread() traces like above deleted.. 2:399 079 <- 2:399 080 -> westart (16 us, 28 total) 2:399 083 -> splhigh (9 us) 2:399 096 -> splx (3 us) 2:399 108 <- 2:399 141 -> werint (30 us) 2:399 172 -> westart (14 us, 27 total) 2:399 174 -> splhigh (10 us) 2:399 187 -> splx (3 us) 2:399 199 <- 2:399 221 <- 2:399 233 -> splnet (10 us) 2:399 246 -> ipintr (137 us, 4056 total) 2:399 248 -> splnet (10 us) 2:399 262 -> splx (3 us) 2:399 280 -> in_cksum (23 us) 2:399 322 -> tcp_input (53 us, 1062 total) 2:399 333 -> in_cksum (901 us, 941 total) 2:399 575 -> ISAINTR (20 us, 40 total) 2:399 584 -> hardclock (9 us, 20 total) 2:399 589 -> gatherstats (11 us) 2:399 604 <- 2:399 615 <- 2:400 274 <- 2:400 304 -> imin (2 us) 2:400 312 -> sbappend (7 us, 22 total) 2:400 317 -> sbcompress (15 us) 2:400 334 <- 2:400 337 -> sowakeup (9 us, 44 total) 2:400 342 -> wakeup (19 us, 35 total) 2:400 344 -> splhigh (10 us) 2:400 360 -> setrq (3 us) 2:400 366 -> splx (3 us) 2:400 377 <- 2:400 381 <- 2:400 384 <- 2:400 386 -> splnet (11 us) 2:400 400 -> splx (3 us) 2:400 416 -> in_cksum (25 us) 2:400 452 -> tcp_input (35 us, 900 total) 2:400 459 -> in_cksum (843 us) *************************** 2:401 322 -> imin (1 us) 2:401 327 -> sbappend (5 us, 17 total) 2:401 331 -> sbcompress (12 us) 2:401 344 <- 2:401 346 -> sowakeup (4 us) 2:401 352 <- 2:401 354 -> splnet (11 us) 2:401 367 -> splx (4 us) 2:401 383 -> in_cksum (23 us) 2:401 417 -> tcp_input (33 us, 899 total) 2:401 423 -> in_cksum (843 us) 2:402 286 -> imin (1 us) 2:402 290 -> sbappend (7 us, 18 total) 2:402 296 -> sbcompress (11 us) 2:402 308 <- 2:402 310 -> sowakeup (4 us) 2:402 316 <- 2:402 318 -> splnet (11 us) 2:402 331 -> splx (4 us) 2:402 346 -> in_cksum (23 us) 2:402 381 -> tcp_input (33 us, 893 total) 2:402 387 -> in_cksum (835 us) 2:403 241 -> imin (1 us) 2:403 246 -> sbappend (9 us, 20 total) 2:403 254 -> sbcompress (11 us) 2:403 266 <- 2:403 268 -> sowakeup (4 us) 2:403 274 <- 2:403 276 -> splnet (11 us) 2:403 289 -> splx (3 us) 2:403 302 <- 2:403 303 -> spl0 (22 us) 2:403 327 <- 2:403 333 -> spl0 (23 us) In the hope that some hard data is useful... Cheers, Andrew McRae inet: andrew@mega.com.au Megadata Pty Ltd, uucp: ..!uunet!mega.com.au!andrew North Ryde 2113 Phone: +61 2 805 0899 NSW AUSTRALIA Fax: +61 2 887 4847