!************************************************************************* !* * !* DBDC-CSVLR - Double bundle method DBDC for clusterwise * !* support vector linear regression * !* * !* VERSION 2 * !* * !* by Kaisa Joki 2018 (last modified 16.9.2019). * !* The code is partly based on codes by Napsu Karmitsa 2016 and * !* by Adil Bagirov 2015 * !* * !* The software is free for academic teaching and research * !* purposes but I ask you to refer the appropriate references * !* given below, if you use it. * !* * !************************************************************************* !* !* !* Codes included: !* !* dbdc_csvlr.f95 - Main program for clusterwise support vector linear regression !* software (this file). !* !* parameters.f95 - Parameters. Includes modules: !* - r_precision - Precision for reals, !* - param - Parameters, !* - exe_time - Execution time. !* !* initcsvlr.f95 - initialization of parameters for clusterwise support vector !* linear regression, LMBM and DBDC. Includes modules: !* - initclr - Initialization of parameters for !* clusterwise support vector linear regression. !* - initlmbm - Initialization of LMBM. !* - initdbdc - Initialization of DBDC. !* !* clrfunctionmod.f95 - Computation of function and (sub)gradients values needed to !* define the clusterwise support vector linear regression problem. !* clrobjfun.f95 - Computation of the value of the clusterwise support vector linear regression !* problem and the corresponding subgradient. !* !* dbdc.f95 - DBDC - double bundle method !* bundle1.f95 - Bundle of DC component f_1 (used in DBDC) !* bundle2.f95 - Bundle of DC component f_2 (used in DBDC) !* functions.f95 - function values of DC components and the corresponding subgradients (used in DBDC) !* !* plqdf1.f - Quadratic solver by Ladislav Luksan !* !* lmbmclrmod.f95 - Subroutines for clusterwise linear regression used in LMBM. !* lmbm.f95 - LMBM - limited memory bundle method. !* subpro.f95 - subprograms for LMBM. !* !* Makefile - makefile. !* !* !* To use the software modify initcsvlr.f95 as needed. !* !* !* References: !* !* !* for DBDC-CSVLR: !* !* K. Joki, A.M. Bagirov, N. Karmitsa, M.M. Mäkelä, S. Taheri, "Clusterwise support vector linear regression", !* submitted (2019). !* !* K. Joki, A.M. Bagirov, N. Karmitsa, M.M. Mäkelä, S. Taheri, "New Bundle Method for Clusterwise Linear !* Regression Utilizing Support Vector Machines", TUCS Technical Report No. 1190, Turku Centre for !* Computer Science, Turku (2017). !* !* for DBDC: !* K. Joki, A.M. Bagirov, N. Karmitsa, M.M. Mäkelä, S. Taheri, "Double Bundle Method for Finding Clarke !* Stationary Points in Nonsmooth DC Programming", SIAM Journal on Optimization, Vol. 28, No. 2, !* pp. 1892-1919, 2018. !* !* for LMBM-CLR: !* N. Karmitsa, A. Bagirov, S. Taheri, "Clusterwise Linear Regression using LMBM" !* !* for LMBM: !* N. Haarala, K. Miettinen, M.M. Mäkelä, "Globally Convergent Limited Memory Bundle Method !* for Large-Scale Nonsmooth Optimization", Mathematical Programming, Vol. 109, No. 1, !* pp. 181-205, 2007. DOI 10.1007/s10107-006-0728-2. !* !* M. Haarala, K. Miettinen, M.M. Mäkelä, "New Limited Memory Bundle Method for Large-Scale !* Nonsmooth Optimization", Optimization Methods and Software, Vol. 19, No. 6, pp. 673-692, 2004. !* DOI 10.1080/10556780410001689225. !* !* N. Karmitsa, "Numerical Methods for Large-Scale Nonsmooth Optimization" in Big Data !* Optimization. A. Emrouznejad (eds.), Springer, 2016. !* !* for NSO clusterwise linear regression (objfunc=1): !* A. Bagirov, N. Karmitsa, M.M. Mäkelä, "Introduction to nonsmooth optimization: theory, !* practice and software", Springer, 2014. !* !* !* Acknowledgements: !* !* The work was financially supported by the University of Turku Graduate School UTUGS Matti Programme, !* the University of Turku, the Academy of Finland (Project No. 289500 and 294002) and the Australian !* Research Council's Discovery Projects funding scheme (Project No. DP140103213). !* !***************************************************************************************** !* !* * PROGRAM csvlr * !* !* Main program for nonsmooth clusterwise linear regression utilizing support vector machines !* software with DBDC and LMBM. !* !***************************************************************************************** PROGRAM lmbmclr USE r_precision, ONLY : prec ! Precision for reals. USE param, ONLY : zero, one, large ! Parameters. USE initclr, ONLY : & ! Initialization of parameters. infile, & ! Dataset file. outfile1, & ! Result file with hyperplanes. outfile2, & ! Result file with predictions. outfile3, & ! Result file with training test. outfile4, & ! AVARAGE: Result file with training test. optmet, & ! Optimization method, from user: ! 1 = LMBM, ! 2 = DBDC. maxdim, & ! Maximum number of variables for optimization. maxinit, & ! Maximum number of candidate points of data set. a, & ! Data matrix a(nrecord,nft), from input file. a4, & ! Auxiliary data matrix a4(nrecord,nft). kvad, & ! Scaling vector for kvadratic terms. gamma1, & ! Parameter affecting the size of the set of starting point. gamma2, & ! Parameter affecting the size of the set of starting point. gamma3, & ! Parameter affecting the size of the set of starting point. gamma4, & ! Parameter affecting the size of the set of starting point. gam1, & ! Multiplier for gamma1. gam2, & ! Multiplier for gamma2. gam3, & ! Multiplier for gamma3. gam4, & ! Multiplier for gamma4. dminim, & ! toler, & ! tlimit, & ! Time limit, from user. noutcom, & ! Number of the output column, form user. itest, & ! Selection of test set usage: ! 1: no test set, ! 2: one training set and one test set, ! 3: cross-validation. ntrain, & ! Number of points in training set. nfoldmax, & ! Number of cross validations. iscale, & ! Selection of scaling: ! 1: scaling, ! 2: no scaling. nupdate, & ! Number of updates. naux, & ! Number of auxiliary function evaluations. naux2, & ! Number of first auxiliary function evaluations. nfit, & ! Number of function evaluations. nclust, & ! Maximum number of clusters, from user. nft, & ! Number of features in data, from user. nrecord, & ! Number of instances in data, from user. nrecord1, & ! Number of instances used in computions. nrecord2, & ! Number of instances used in computions. nc, & ! Current number of clusters, loops from 1 to nclust. ns, & ! Switch for auxiliary and real clustering problem. nel1, & nob1, & nl, & nlist, & nlisttrain, & init_clrpar, & ! Furher initialization of parameters. def_clrpar ! Default values of clustering parameters. USE clrmod, ONLY : & ! Subprograms for clusterwise linear regression. scaling, & ! ok rescaling, & ! ok distribution, & ! ok step1, & ! ok cleaning, & ! ok checkfinal ! ok USE clrfunctionmod, ONLY : & clrfunc, & ! Function calculus. clrfunc1, & ! Function calculus. clrfunc2, & ! Function calculus. auxfunc, & ! Function calculus. auxfunc1, & ! Function calculus. auxfunc2, & ! Function calculus. fvb1, & ! Function calculus. fvb2, & ! Function calculus. gradient1, & ! Function calculus. gradient2, & ! Function calculus. hauxfunc ! Function calculus USE lmbm_mod, ONLY : optim2 ! LMBM method for optimization. USE dbdc, ONLY : optim3 ! DBDC method for optimization. !USE bfgs_mod, ONLY : optim1 ! BFGS method for solving auxiliary problem. USE exe_time, ONLY : getime ! Execution time. IMPLICIT NONE REAL(KIND=prec), DIMENSION(:), allocatable :: & x, & kokoko, & x1, & xbest, & x4, & z, & !z1, & d8, & fv, & fv2, & a1, & xc, & avRMSE, & avCE, & avR REAL(KIND=prec), DIMENSION(:,:), allocatable :: & ftr, & ftest, & fscale, & rmsetr, & rmsetest, & rtest, & rtr, & cetr, & cetest REAL(KIND=prec) :: & tol2, & p2,p3,p4, & d7,d9, & f,f2,f3, & fkokoko, & fkokoko1, & fkokoko2, & eka,toka, & osa1,osa2, & fv1, & fbest,fb, & fbest1, & scale_f, & scale1,scale2, & fmin,fmin1, & times, & ! Starting time timefin, & ! Finishing time timetot, & ! Total time avarage, & ! avarage r1, r2, & ff,ff1,rmse1,rmse2,ce1,ce2, koe(maxdim),koe1(maxdim),koe2(maxdim) REAL :: & time1, & ! Starting time time4, & ! Finishing time timef ! CPU time used INTEGER :: & nrecord3, & nrecord4, & i,j,j1,k, & jpoints, & nfold, & n1,nl1, & ncount, & neltotal, & ngood0, & naux2tot, & nauxtot, & nfittot, & ii ! Intrinsic Functions INTRINSIC MAX allocate(a1(nft),x(maxdim),xc(maxinit),x1(maxdim),x4(maxdim),xbest(maxdim),z(maxdim),kokoko(maxdim),& d8(nrecord),fv(nrecord),fv2(nrecord),avRMSE(nclust),avCE(nclust),avR(nclust)) allocate(ftr(nclust,nfoldmax),ftest(nclust,nfoldmax),fscale(nclust,nfoldmax),rmsetr(nclust,nfoldmax),& rmsetest(nclust,nfoldmax),cetr(nclust,nfoldmax),cetest(nclust,nfoldmax),rtr(nclust,nfoldmax),rtest(nclust,nfoldmax)) tol2=1.0E-7_prec*REAL(nrecord,prec) !tol2=1.0E-6_prec*REAL(nrecord,prec) IF (nrecord > 15000) tol2 = 1.0E-4_prec*REAL(nrecord,prec) CALL init_clrpar() CALL def_clrpar() dminim=large OPEN(44,file=outfile1) OPEN(41,file=outfile2) OPEN(43,file=outfile3) OPEN(45,file=outfile4) OPEN(78,file=infile,status='old',form='formatted') IF(optmet==1) THEN WRITE(44, *) 'Optimization with LMBM.' WRITE(41, *) 'Optimization with LMBM.' WRITE(43, *) 'Optimization with LMBM.' WRITE(45, *) 'Optimization with LMBM.' ELSE WRITE(44, *) 'Optimization with DBDC.' WRITE(41, *) 'Optimization with DBDC.' WRITE(43, *) 'Optimization with DBDC.' WRITE(45, *) 'Optimization with DBDC.' END IF CALL getime(time1) DO i=1,nrecord READ(78,*) (a(i,k),k=1,nft) END DO WRITE(44,*) 'Number of records: ',nrecord,' Number of features: ',nft WRITE(44,*) !================================================================ IF (noutcom < nft) THEN DO i=1,nrecord j1=0 DO j=1,nft IF (j /= noutcom) then j1=j1+1 a1(j1)=a(i,j) END IF END DO a1(nft)=a(i,noutcom) DO j=1,nft a(i,j)=a1(j) END DO END DO END IF DO i = 1, nft-1 kvad(i)=1.0_prec END DO !================================================================= ! Division to training and test sets (not randomly) !================================================================= outerloop: DO nfold = 1,nfoldmax IF (itest <= 2) nrecord2 = nrecord - ntrain IF (itest == 3) THEN nrecord2 = nrecord/nfoldmax nrecord3=(nfold-1)*nrecord2+1 nrecord4=nfold*nrecord2 nl=0 DO j=nrecord3,nrecord4 nl=nl+1 nlist(j-nrecord3+1)=j END DO nl1=0 innerloop1: DO i=1,nrecord DO j=1,nl IF (i == nlist(j)) CYCLE innerloop1 END DO nl1 = nl1+1 DO k=1,nft a4(nl1,k)=a(i,k) END DO nlisttrain(nl1)=i END DO innerloop1 ELSE IF (itest == 1) THEN nl=0 DO k=1,nft DO i=1,nrecord a4(i,k)=a(i,k) END DO END DO DO i=1,nrecord nlisttrain(i)=i END DO ELSE IF (itest == 2) THEN DO i=1,ntrain DO k=1,nft a4(i,k)=a(i,k) END DO nlisttrain(i)=i END DO nl=nrecord-ntrain DO i=1,nl nlist(i)=i+ntrain END DO END IF nrecord2=nl nrecord1=nrecord-nrecord2 !================================================ IF (iscale == 1) CALL scaling !=============================================== p2 = zero p3 = zero p4 = zero ncount = 0 nfittot=0 nauxtot=0 naux2tot=0 innerloop2: DO nc=1,nclust naux = 0 naux2 = 0 nfit = 0 neltotal = 0 PRINT 3,nc 3 FORMAT(' The number of hyperplanes:',i5) IF (nc > 1) THEN gamma1=gam1*gamma1 gamma2=gam2*gamma2 gamma3=gam3*gamma3 gamma4=gam4*gamma4 IF (gamma1 > one) gamma1=one IF (gamma2 < 1.005) gamma2=1.005_prec IF (gamma3 < 1.005_prec) gamma3=1.005_prec IF (gamma4 > one) gamma4=one CALL step1(x,xc,ngood0) fmin=large DO i=1,ngood0 DO j=1,nft z(j)=xc(j+(i-1)*nft) END DO d7=large d9=zero d8(:)=zero DO k=1,nrecord1 d8(k)=zero DO j=1,nft-1 d8(k)=d8(k)+(a4(k,j)-z(j))**2 END DO IF (d8(k) <= d7) d7=d8(k) IF (d8(k) > d9) d9=d8(k) END DO d7=d7+gamma4*(d9-d7) nel1=0 DO k=1,nrecord1 CALL hauxfunc(z,1,k,f3) IF (f3 < dminim(k)) THEN IF (d8(k) <= d7) THEN nel1=nel1+1 nob1(nel1)=k END IF END IF END DO IF (nel1 > 0) THEN nupdate=0 ! Solving auxiliary problem ns=3 ! The special method is used to solve this problem ! ns=0 ! NO special method is used to solve this problem IF (optmet==1) THEN CALL optim2(z,xc((i-1)*nft+1:),f) ELSE CALL optim3(z,xc((i-1)*nft+1:),f) END IF IF (optmet==1) THEN CALL auxfunc(xc((i-1)*nft+1:),f) ELSE CALL auxfunc1(xc((i-1)*nft+1:),eka) CALL auxfunc2(xc((i-1)*nft+1:),toka) f = eka - toka END IF neltotal=neltotal+nel1*nupdate+nrecord1 ncount=ncount+1 ELSE ! nel1 == 0 f=large END IF fv2(i) = f IF (f < fmin) fmin=f END DO fmin1=gamma2*fmin jpoints=0 DO i=1,ngood0 fv1=fv2(i) IF (fv1 <=fmin1) THEN jpoints=jpoints+1 DO j=1,nft xc(j+(jpoints-1)*nft)=xc(j+(i-1)*nft) END DO END IF END DO CALL cleaning(jpoints,xc) ngood0=jpoints fbest=large DO i=1,ngood0 ns=1 DO j=1,nft z(j)=xc(j+(i-1)*nft) END DO IF (optmet==1) THEN CALL optim2(z,xc((i-1)*nft+1:),f) !CALL optim2(z,z1,f) ELSE CALL optim3(z,xc((i-1)*nft+1:),f) END IF ncount=ncount+1 fv(i)=f IF (f < fbest) fbest=f END DO fbest1=gamma3*fbest jpoints=0 DO i=1,ngood0 fv1=fv(i) IF (fv1 <= fbest1) THEN jpoints=jpoints+1 DO j=1,nft xc(j+(jpoints-1)*nft)=xc(j+(i-1)*nft) END DO END IF END DO CALL cleaning(jpoints,xc) ngood0=jpoints fb=large DO i=1,ngood0 ns=2 DO j=1,nft x(j+(nc-1)*nft)=xc(j+(i-1)*nft) END DO IF (optmet==1) THEN CALL optim2(x,x1,f) ELSE CALL optim3(x,x1,f) END IF ncount=ncount+nc IF (f < fb) THEN fb=f DO k=1,nc DO j=1,nft xbest(j+(k-1)*nft)=x1(j+(k-1)*nft) END DO END DO END IF END DO DO k=1,nc DO j=1,nft x(j+(k-1)*nft)=xbest(j+(k-1)*nft) END DO END DO ELSE ! nc == 1 DO j=1,nft z(j)=one END DO nel1=nrecord1 DO k=1,nrecord1 nob1(k)=k END DO nupdate=0 ns=3 ! The special method is used to solve the problem !ns=0 ! NO special method used for this problem IF (optmet==1) THEN CALL optim2(z,x,f) ELSE CALL optim3(z,x,f) END IF IF (optmet==1) THEN CALL clrfunc(x,f) ELSE CALL clrfunc1(x,eka) CALL clrfunc2(x,toka) f = eka - toka END IF ncount=ncount+1 neltotal = nel1*nupdate + nrecord1 END IF CALL distribution(x,f) toler=MAX(tol2,tol2*f/REAL(nrecord1,prec)) IF (nrecord > 50000) toler = MAX(tol2,100.0_prec*tol2*f/REAL(nrecord1,prec)) PRINT*,toler, 10.0_prec*tol2*f/REAL(nrecord1,prec) CALL clrfunc1(x,scale1) CALL clrfunc2(x,scale2) scale_f = scale1 - scale2 IF (iscale == 1) THEN call rescaling(x,x4) ELSE DO k=1,nc DO j=1,nft x4(j+(k-1)*nft)=x(j+(k-1)*nft) END DO END DO END IF CALL checkfinal(x4,ff,ff1,rmse1,rmse2,ce1,ce2,r1,r2) ftr(nc,nfold)=ff ftest(nc,nfold)=ff1 fscale(nc,nfold)=scale_f rmsetr(nc,nfold)=rmse1 rmsetest(nc,nfold)=rmse2 cetr(nc,nfold)=ce1 cetest(nc,nfold)=ce2 rtr(nc,nfold)=r1 rtest(nc,nfold)=r2 naux2tot=naux2tot+naux2 nauxtot=nauxtot+naux nfittot=nfittot+nfit WRITE(44,*) WRITE(44,573) nc 573 FORMAT('No of hyperplanes:',I4) WRITE(44,*) WRITE(41,*) WRITE(41,*) DO k=1,nc WRITE(41,722) (x4(j+(k-1)*nft),j=1,nft) END DO 722 FORMAT(20f12.5) WRITE(44,543) ff 543 FORMAT('Fit function value:',f24.8) WRITE(44,*) WRITE(44,545) scale_f 545 FORMAT('Fit function value in scaled data:',f24.8) WRITE(44,*) WRITE(44,541) ncount 541 FORMAT('The number of linear regression problems solved:',i10) WRITE(44,542) naux2tot,nauxtot,nfittot 542 FORMAT('The numbers of evaluations:',i10,i10,i10) CALL getime(time4) timef=time4-time1 WRITE(44,*) WRITE(44,141) timef PRINT*,'obj_scale:',scale_f,'obj:',ff,'time',timef WRITE(44,*) write(44,*) WRITE(44,*) p2=p2+REAL(neltotal,prec) p3=p3+REAL(naux*nrecord1,prec) IF(nc == 1) THEN p4=p4+REAL(neltotal,prec) ELSE p4=p4+REAL(nfit*nrecord*nc,prec) END IF IF (timef > tlimit) EXIT outerloop END DO innerloop2 141 FORMAT('CPU time:',f12.3) END DO outerloop 210 CONTINUE WRITE(43,803) 803 FORMAT('Fit function values on training set:') WRITE(43,*) DO i=1,nclust WRITE(43,332) i WRITE(43,331) (ftr(i,j),j=1,nfoldmax) END DO IF (itest >= 2) THEN WRITE(43,804) 804 FORMAT('Fit function values on test set:') WRITE(43,*) DO i=1,nclust WRITE(43,332) i WRITE(43,331) (ftest(i,j),j=1,nfoldmax) END DO END IF 331 FORMAT(5f28.6) 332 FORMAT(i6) WRITE(43,*) WRITE(45,*) WRITE(43,802) 802 FORMAT('RMSE, CE and r coefficients for training set:') DO i=1,nclust WRITE(43,231) i avRMSE(i)=0.0_prec avCE(i)=0.0_prec avR(i)=0.0_prec DO ii = 1,nfoldmax avRMSE(i) = avRMSE(i)+rmsetr(i,ii) avCE(i) = avCE(i)+cetr(i,ii) avR(i) = avR(i)+rtr(i,ii) END DO avRMSE(i) = avRMSE(i)/REAL(nfoldmax,prec) avCE(i) = avCE(i)/REAL(nfoldmax,prec) avR(i) = avR(i)/REAL(nfoldmax,prec) WRITE(43,230) (rmsetr(i,n1),n1=1,nfoldmax) WRITE(43,230) (cetr(i,n1),n1=1,nfoldmax) WRITE(43,230) (rtr(i,n1),n1=1,nfoldmax) WRITE(43,*) WRITE(43,*) 'Avarage RMSE:', avRMSE(i) WRITE(43,*) 'Avarage CE:', avCE(i) WRITE(43,*) 'Avarage R:', avR(i) WRITE(43,*) WRITE(45,*) END DO WRITE(45,*) WRITE(45,819) 819 FORMAT('Avarage RMSE for training set:') WRITE(45,*) DO i= 1,nclust WRITE(45,230) avRMSE(i) END DO WRITE(45,*) WRITE(45,829) 829 FORMAT('Avarage CE for training set:') WRITE(45,*) DO i= 1,nclust WRITE(45,230) avCE(i) END DO WRITE(45,*) WRITE(45,839) 839 FORMAT('Avarage R for training set:') WRITE(45,*) DO i= 1,nclust WRITE(45,230) avR(i) END DO IF (itest > 1) THEN WRITE(43,*) WRITE(43,*) WRITE(43,801) 801 FORMAT('RMSE, CE and r coefficients for test set:') DO i=1,nclust WRITE(43,231) i avRMSE(i)=0.0_prec avCE(i)=0.0_prec avR(i)=0.0_prec DO ii = 1,nfoldmax avRMSE(i) = avRMSE(i)+rmsetest(i,ii) avCE(i) = avCE(i)+cetest(i,ii) avR(i) = avR(i)+rtest(i,ii) END DO avRMSE(i) = avRMSE(i)/REAL(nfoldmax,prec) avCE(i) = avCE(i)/REAL(nfoldmax,prec) avR(i) = avR(i)/REAL(nfoldmax,prec) WRITE(43,230) (rmsetest(i,n1),n1=1,nfoldmax) WRITE(43,230) (cetest(i,n1),n1=1,nfoldmax) WRITE(43,230) (rtest(i,n1),n1=1,nfoldmax) WRITE(43,*) WRITE(43,*) 'Avarage RMSE:', avRMSE(i) WRITE(43,*) 'Avarage CE:', avCE(i) WRITE(43,*) 'Avarage R:', avR(i) WRITE(43,*) WRITE(45,*) END DO 231 FORMAT(i10) 230 FORMAT(10f24.8) WRITE(45,*) WRITE(45,849) 849 FORMAT('Avarage RMSE for test set:') WRITE(45,*) DO i= 1,nclust WRITE(45,230) avRMSE(i) END DO WRITE(45,*) WRITE(45,859) 859 FORMAT('Avarage CE for test set:') WRITE(45,*) DO i= 1,nclust WRITE(45,230) avCE(i) END DO WRITE(45,*) WRITE(45,869) 869 FORMAT('Avarage R for test set:') WRITE(45,*) DO i= 1,nclust WRITE(45,230) avR(i) END DO END IF CLOSE(40) CLOSE(41) CLOSE(43) CLOSE(44) CLOSE(78) deallocate(a1,x,x1,x4,xbest,z,d8,fv,fv2) !deallocate(a1,x,x1,x4,xbest,z,z1,d8,fv,fv2) deallocate(ftr,ftest,fscale,rmsetr,rmsetest,cetr,cetest,rtr,rtest) deallocate(avR,avRMSE,avCE) STOP END PROGRAM lmbmclr