*BSD News Article 32140

Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.oz.au!comp.vuw.ac.nz!newshost.wcc.govt.nz!aladdin.wcc.govt.nz!zheng
From: zheng@aladdin.wcc.govt.nz ()
Newsgroups: comp.os.386bsd.questions
Subject: netbsd-0.9 syscall -- LONG !!! --
Date: 23 Jun 1994 10:54:17 GMT
Organization: Wellington City Council, Wellington, New Zealand
Lines: 213
Sender: Chuck Zheng (zheng@aladdin.wcc.govt.nz)
Message-ID: <2ubpkp$ccg@golem.wcc.govt.nz>
NNTP-Posting-Host: aladdin.wcc.govt.nz

Hello,

I have always been wondering how does a user process trap into kernel
via a system call on NetBSD (v0.9).  After on and off efforts over
several months study of NetBSD-0.9 source codes, Bill Jolitzs' DDJ
series and a 386 programmer guide,  together advises from serveral
poeple on the net,  I finally sort of patch the whole picture to a
recognizable form, as listed bellow.  I still have not found where/how 
is user process LDT set up exactly.  I would like to read your comments
and advise.


cheers,
chuck


How does netbsd-0.9 do system call
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- user process issues intersegmental (far procedure) call: [DDJ,Jan91,p38]

	lcall	$0x7,	0x0	# [DDJ,Jan91,p39,Fig8]

  which saves current code segment selector (value of CS) and PC (EIP) value
  on stack. control then transfers to the destination specified by the
  operand: (selector-16bit : offset-32bit) pair (0x7, 0x0). In other words,
  selector 0x7 is loaded into CS register. [386/486,p187]

- Actually, to work around of a bug in GAS, user process issues a macro: 

	LCALL(0x7,0x0)

  this can be expanded to

	.byte 0x9a ; .long 0x0; .word 0x7

  which equals to 

	lcall	$0x7,	0x0	

  ## info source came from [DDJ,Mar91,p82]


- This 16-bit selector 0x7 = 0000 0000 0000 0111 (in binary) [386/486,p59]
			     ----------------A--
				Index = 0    | RPL = 3 = user mode/level
					     |
					     TI = 1 = Local Descriptor Table

  Thus the selector points to the slot zero (the first entry) of the
  current LDT [386/486,p58].  

- First entry for *KERNEL* LDT is set up in file /sys/arch/i386/i386/machdep.c 

  ...
  #define GCODE_SEL       1       /* Kernel Code Descriptor */
  #define GLDT_SEL        3       /* LDT - eventually one per process */

  ...
  /* local descriptor table */
  union descriptor ldt[5];
  #define LSYS5CALLS_SEL  0       /* forced by intel BCS */
  ...

  init386(first)
  {
  ...
  struct gate_descriptor *gdp;

  ...
  lldt(GSEL(GLDT_SEL, SEL_KPL));

  ...
  /* make a call gate to reenter kernel with */
  gdp = &ldt[LSYS5CALLS_SEL].gd;

  x = (int) &IDTVEC(syscall);
  gdp->gd_looffset = x++;
  gdp->gd_selector = GSEL(GCODE_SEL,SEL_KPL);
  gdp->gd_stkcpy = 0;
  gdp->gd_type = SDT_SYS386CGT;
  gdp->gd_dpl = SEL_UPL;
  gdp->gd_p = 1;
  gdp->gd_hioffset = ((int) &IDTVEC(syscall)) >>16;

  ...
  }

  macros and struct are defined in /sys/arch/i386/include/segments.h:

  #define SEL_KPL 0               /* kernel priority level */     
  #define SEL_UPL 3               /* user priority level */
  #define GSEL(s,r)       (((s)<<3) | r)         /* a global selector */
  ...
  #define SDT_SYS386CGT   12      /* system 386 call gate */

  lldt() is defined in /sys/arch/i386/include/locore.s

  #define ENTRY(name)     .globl _/**/name; ALIGN_TEXT; _/**/name:

  ...
    /*
    * void lldt(u_short sel)
    */
  ENTRY(lldt)
    lldt    4(%esp)
    ret


  init386() is invoked during system startup by /sys/arch/i386/i386/locore.s.
  It sets up the first entry of LDT as a call gate descriptor, with selector

	GSEL(GCODE_SEL,SEL_KPL)

  points to a kernel code segment and an offset into the segment. [386/486,p112]
  The offset is the kernel space address for interrupt descriptor table IDT
  entry:

	&IDTVEC(syscall)

  which is defined in /sys/arch/i386/i386/locore.s. (some macros in locore.s
  and machdep.c interprets IDTVEC).

- The setup for kernal LDT is copied/inherited by subsequent user process 
  during fork().  /sys/kern/kern_fork.c reads:

  int fork1(p1, isvfork, retval)
    register struct proc *p1;
    int isvfork, retval[];
  {
    register struct proc *p2;

    ...
    /* Allocate new proc. */
    MALLOC(p2, struct proc *, sizeof(struct proc), M_PROC, M_WAITOK);

    ...   
    /*
    * Make a proc table entry for the new process.
    * Start by zeroing the section of proc that is zero-initialized,
    * then copy the section that is copied directly from the parent.
    */
    bzero(&p2->p_startzero,
      (unsigned) ((caddr_t)&p2->p_endzero - (caddr_t)&p2->p_startzero));
    bcopy(&p1->p_startcopy, &p2->p_startcopy,
      (unsigned) ((caddr_t)&p2->p_endcopy - (caddr_t)&p2->p_startcopy));

    ...
  }

  struct proc is defined in /usr/src/sys/sys/proc.h.  A proc slot has a pointer
  to struct user:

	struct  user *p_addr;   /* kernel virtual addr of u-area (PROC ONLY) */

  struct user is defined in /usr/src/sys/sys/user.h.  user has a field:

	struct  pcb u_pcb;

  struct pcb is defined in /sys/arch/i386/include/pcb.h.  pcb has a field:

	struct  i386tss pcb_tss;

  struct i386tss is defined in /sys/arch/i386/include/tss.h.  i386tss has a
  field:

	int     tss_ldt;        /* actually 16 bits: top 16 bits must be zero*/

  This field holds the selector for LDTR.


  Since bcopy only copys "struct user *p_addr" from p1 to p2, I guess XXX
  some routhines overlooked by me must do the actual "copy" of tss_ldt from
  p1 to p2, or there is a copy-on-write scheme implemented (is that vfork?).

  Can somebody clarify this further (and point out error I have made above)?
  There is also a cpu_fork() routine in /sys/arch/i386/i386/vm_machdep.c,
  which does some copying similar to fork1():

  cpu_fork(p1, p2)
    register struct proc *p1, *p2;
  {
    ...
    /*
     * Copy pcb and stack from proc p1 to p2. 
     * We do this as cheaply as possible, copying only the active
     * part of the stack.  The stack and pcb need to agree;
     * this is tricky, as the final pcb is constructed by savectx,
     * but its frame isn't yet on the stack when the stack is copied.
     * swtch compensates for this when the child eventually runs.
     * This should be done differently, with a single call
     * that copies and updates the pcb+stack,
     * replacing the bcopy and savectx.
     */
    p2->p_addr->u_pcb = p1->p_addr->u_pcb;
    offset = mvesp() - (int)kstack;
    bcopy((caddr_t)kstack + offset, (caddr_t)p2->p_addr + offset,
      (unsigned) ctob(UPAGES) - offset);
    p2->p_regs = p1->p_regs;

  ...
  }

  I have not been able to find out who calls this routine.  Does anyone
  knows what is it for and how is it used?


Reference
~~~~~~~~~
[DDJ] -- Dr. Dobb's Journal

[386/486] -- Microsoft's 80386/80486 Programming Guide.  Ross Nelson.