matthias muntwiler bbd16d0f94 add files for public distribution
based on internal repository 0a462b6 2017-11-22 14:41:39 +0100
2017-11-22 14:55:20 +01:00

202 lines
7.6 KiB
Mathematica

***************************************************************
* LOESS smoothing scattered data in one or more variables *
* documentation of Fortran routines *
* Cleveland, Devlin, Grosse, Shyu *
***************************************************************
1. The typical program would call lowesd, set tolerances in iv,v if
desired, then call lowesb and lowese.
2. To save the k-d tree, call lowesd, lowesb and then losave; subsequent
programs would call lohead, set liv and lv, then call loread and lowese.
3. For statistics, get diagL and then call lowesa or get the full hat
matrix and call lowesc. Robustness iterations can take advantage of
lowesw and lowesp.
lowesd(106,iv,liv,lv,v,d,n,f,tdeg,nvmax,setLf) setup workspace
lowesf(x,y,w,iv,liv,lv,v,m,z,L,hat,s) slow smooth at z
lowesb(x,y,w,diagL,infl,iv,liv,lv,v) build k-d tree
lowesr(y,iv,liv,lv,v) rebuild with new data values
(does not change y)
lowese(iv,liv,lv,v,m,z, s) evaluate smooth at z
lowesl(iv,liv,lv,v,m,z, L) explicit hat matrix,
which maps from y to z
lofort(iunit,iv,liv,lv,v) save k-d tree as Fortran
losave(iunit,iv,liv,lv,v) save k-d tree in file
lohead(iunit,d,vc,nc,nv) read d,vc,nc,nv from file
liv = 50+(vc+3)*nc determine space
lv = 50+(2*d+1)*nv+nc requirements
loread(iunit,d,vc,nc,nv,iv,liv,lv,v) finish reading k-d tree,
ready for lowese
lowesa(trL,n,d,tau,nsing, del1,del2) approximate delta
lowesc(n,L,LL, trL,del1,del2) exact delta
lowesp(n,y,yhat,w,rw, pi,ytilde) pseudo-values
lowesw(res,n, rw,pi) robustness weights
=== arguments ===
d number of independent variables [integer] (called "p" elsewhere)
del1,del2 delta1, delta2
diagL diagonal of hat matrix, only set if infl=.true. (n)
f fraction of points to use in local smooth (called "alpha" elsewhere)
fc don't refine cells with less than fc*n points; ordinarily=.05
hat is hat matrix desired? [integer]
0 = none
1 = diagonal only
2 = full matrix
infl is diagonal of hat matrix desired? [logical]
iunit Fortran unit number for i/o
iv workspace (liv)
L hat matrix (m,n) [real]
in lowesf, only computed if hat nonzero; if hat=1 only size (n)
LL workspace (n,n)
liv 50+(2^d+4)*nvmax+2*n
if setLf, add nf*nvmax
lv 50+(3*d+3)*nvmax+n+(tau0+2)*nf
if setLf, add (d+1)*nf*nvmax
m number of points to smooth at; ordinarily=n
n number of observations
nf min(n,floor(n*f))
nsing if 0, print warning in lowesa when trL<tau; typically nsing=iv(30)
nvmax limit on number of vertices for kd-tree; e.g. max(200,n)
pi workspace (n) [integer]
res residual yhat-y (n)
rw robustness weights (n)
s smoothed values at z (m)
setLf in lowesb, save matrix factorizations [logical]
(needed for lowesr and lowesl)
tau dimension of local model = iv(DIM);
=d+1 for linear, (d+2)(d+1)/2 for quadratic
reduced if dropping squares
tau0 =d+1 for linear, (d+2)(d+1)/2 for quadratic
tdeg polynomials to fit; 0=constants, 1=linear, 2=quadratics
trL trace L = sum diagL
v workspace (lv)
w weights (n) local regression: min sum wi * (f(xi)-yi)^2
x sample locations (n,d)
y observations (n)
yhat smoothed y (n)
ytilde pseudo y (n)
z locations where smooth is desired (m,d)
If using the double precision version, [real] above should be understood
as Fortran "double precision".
The first argument to lowesd is a version number, updated when calling
sequences change.
If you peek inside the fortran, you will quickly notice that it
was machine generated; the typeset original (in the language "pine")
is much easier to read.
=== iv indices ===
1 INFO return code (not currently used)
2 D number of independent variables
3 N number of observations
4 VC 2^d (number of vertices of a cell)
5 NC number of k-d cells
6 NV number of k-d vertices
7 A1 starting index in iv of a
8 C1 starting index in iv of c
9 HI1 starting index in iv of hi
10 LO1 starting index in iv of lo
11 V1 starting index in v of vertices
12 XI1 starting index in v of cut values
13 VV1 starting index in v of vertex values
14 NVMAX maximum allowed value of nv
15 WORK1 starting index in v of workspace
16 WORK2 starting index in v of workspace
17 NCMAX maximum allowed value of nc
18 WORK3 starting index in v of workspace
19 NF floor(n*f) (number of points used as neighborhood)
20 KERNEL 1=tricube, 2=unif
21 KIND 1=k-d,cubic blend, (not implemented:2=quadtree,3=triangulation)
22 PI1 starting index in iv of tree permutation
23 VH starting index in iv of vhit
24 VV2 starting index in v of work vval used in trL computation
25 LQ starting index in iv of Lq
26 WORK4 starting index in v of workspace
27 PSI1 starting index in iv of workspace permutation
28 SEQ sequence number, to check if routines called out of order
takes on values:
171 after lowesd
172 after lowesf
173 after lowesb
29 DIM dimension of local regression
1 constant
d+1 linear (default)
(d+2)(d+1)/2 quadratic
Modified by ehg127 if cdeg<tdeg.
30 SING number of times singular tolerance was met in l2fit, l2tr
31 PRINT verbose output?
32 DEG total degree (of polynomial for local model)
33 NDIST dd = variables 1:dd enter into distance calculation
34 LF starting index in v of Lf
35..40 reserved for future use
41..49 CDEG componentwise degree
iv(A1) a coordinate of cut; 0 for leaf (nc)
iv(C1) c pointers to corners (index into vertex array v) (vc,nc)
iv(HI1) hi right subcell (nc)
iv(LO1) lo left subcell (nc)
Leaf cell j encloses points x(pi(i),), lo(j)<=i<=hi(j).
Also, iv(C1),...,iv(PI1-1) is used as workspace (t) by l2fit
------------------------eval only needs workspace up to here
iv(PI1) pi permutation of 1:n for listing points in cells
iv(VH) vhit cell whose subdivision creates vertex (nv)
0 if vertex is corner of original bounding box.
iv(LQ) Lq active point indices for block of Lf (nvmax*nf)
iv(PSI1) psi workspace permutation of 1:n for sorting distances
=== v indices ===
1 F fraction of n to be used as neighborhood. See also iv(19).
2 FCELL no refinement if #points <= fcell * n
default .05
3 FDIAM no refinement if diameter is fdiam * overall bounding box
default 0; Warning: reset to 0 by ehg142 when nsteps>0.
4 RCOND reciprocal condition number
... 49 reserved for future use
iv(V1) v vertices (nv,d)
iv(VV1) vval vertex values (0:d,nv)
iv(XI1) xi cut values (nc)
------------------------eval only needs workspace up to here
iv(WORK1) workspace (n) l2fit:dist
iv(WORK2) workspace (nf) l2fit:eta
iv(WORK3) workspace (dim,nf) l2fit:X
iv(VV2) vval2 workspace ((d+1)*nv) pseudo-vval for trL computation
iv(LF) Lf hat matrix (data to vertex) ((d+1)*nvmax*nf)
iv(WORK4) workspace (nf) l2fit:w
Internal routine names have been hidden as follows:
ehg106 select q-th smallest by partial sorting
ehg124 rbuild
ehg125 cpvert
ehg126 bbox
ehg127 l2fit,l2tr computational kernel
ehg128 eval
ehg129 spread
ehg131 lowesb after workspace expansion
ehg133 lowese after workspace expansion
ehg134 abort by calling S Recover function
ehg136 l2fit with hat matrix L
ehg137 vleaf
ehg138 descend
ehg139 l2tr
ehg140(w,i,j) w(i)=j used when w is declared real, but should store an int
ehg141 delta1,2 from trL
ehg142 robust iteration
ehg144 now called lowesc
ehg152 like ehg142, but for lowesf
ehg167 kernel for losave
ehg168 kernel for loread
ehg169 compute derived k-d tree information
ehg170 generate Fortran
ehg176,ehg177,ehg178,ehg179,ehg180,ehg181 loeval for delta
ehg182 ehgdie(i)
ehg183 warning(message,i,n,inc)
ehg184 warning(message,x,n,inc)
ehg190 now called lowesa, with slight change in calling sequence
ehg191 lowesl after workspace expansion
ehg192 lowesr after workspace expansion
ehg196(tau,d,f,trl) trL approximation
ehg197 for deg=1,2
m9rwt now called lowesw
pseudo now called lowesp