Geert Molenberghs got his start in mathematics, but quickly switched to statistics and eventually obtained a Ph.D. with a focus in medical statistics. His work includes longitudinal data, repeated measures on one, incomplete data, missing data, categorical data, and continues data. In this interview he talks about his books, tips and strategies for Top SAS Programmers, and illustrates the methodology that is proposed with the SAS-CO, with SAS programs with data analysis. His tips are easy and practical and include knowing what you’re doing in SAS, being patient by trying several values and build a realistic library by talking to others, build experience as you go along, and keep familiar with departments in surrounding areas – such as numerical analysis. Geert also talks about the SAS procedure he uses most often: PROC MIXED, PROC GENMOD, PROC NLMIXED, PROC GEE, PROC GLMMIX. He has always valued the was SAS as a company have maintained links with the community and is happy to contribute his own expert advice.
Prof. Geert Molenberghs
Thank you. My career has been going on for several decades now, but apart from a start in pure mathematics because I set out on algebra, I very quickly switched to statistics, in particular applied statistics put a focus on medical statistics for my Ph.D. And ever since I have been working in that area. It’s throughout that I’ve been working on longitudinal data, repeated measures on one, incomplete data or missing data or categorical data as well as the continuous data. In particular, I focus on clinical trials like missing data in clinical trials, how to analyze clinical trials, and some emphasis has always been placed on target market evaluations. So target end points instead of the endpoints in a clinical trial. Most of those indeed have led to some research output every now and a book that includes books in the SAS series.
One of the earlier, well it’s not the absolute earliest, but it’s the 2000 book published by Springer which I wrote together with Geert Verbeke on longitudinal – longitudinal data of a continues, so grouchy in nature. Almost two decades ago it was followed by a book on the categorical version that was published five years later in 2005. In both of those we have extensively illustrated the methodology that we propose with the SAS- CO, with SAS programs with that data analysis.
Can you propose 3 valuable tips or strategies that are necessary to become a Top SAS programmer?
Prof. Geert Molenberghs
That’s a good question. It’s also a bit difficult to answer because I definitely do not consider myself an expert or top SAS programmer which may also be true for other packages I’m working with. As an academic, we are working on many different projects. We teach, we develop methodology, we look at the mathematics, we do consulting, and frequently there is programming involved. But with that in mind, I’ve seen a few important things for myself but also for the students I’m teaching with the collaborators. And that is first and foremost know what you’re doing. That means if you do SAS programming it’s usually about relatively sophisticated data analysis tools models, for example let’s call it models. So if you program a model don’t go by something that looks similar, I’ll start modifying that code. Really know what you’re doing so you see the math, the screen, the modeling to the equation and what you’re doing in SAS. So make sure that the right thing comes out at the end of the day by really having a good working knowledge about the strategy you’re working with. Familiarize yourself with what you’re going to program before you program. I think that’s the first thing I would definitely say.
The second advice I would like to give is for somebody programming, especially if you’re dealing with more complicated models is: be patient! Programming and making a program work needs to be syntactically correct but let’s assume that’s the case. But many statistical procedures are not so straightforward in the sense you may fit the model to a set of data and it may not work simply because the starting values are not optimal or not well chosen, etc. Well if you approach that with a mindset: tell me what the perfect starting values are, it is going to be disappointing because in many cases there is no absolute clear scientific reason why you should use one or the other. So don’t procrastinate over it, just be patient and try several values and build let’s say you realistic library by talking to others, by working yourself on it and building experience as you go and you you’ll see before you know it starts working.
And I think as a third piece of advice keep fairly familiar with developments in surrounding areas. And by that I mean, yes, we talked about statistics and how models are translated into programming but what I was just saying is programming is also getting most to converge is also numerical analysis. So a good SAS programmer should have a very good working knowledge both on the statistical area in which you are working as well as in the numerical analysis needed to make it work. There are a lot of things happening, there are a lot of trends especially in these big data and data science area time, so keep abreast of the evolutions there by following the literature, by then conferences etc. and by talking to other people. So as an aside I think conferences especially but not only the SAS conferences for example the SAS Global Forum etc. are extremely relevant in that set.
Which SAS procedure do you find most useful (do you use most often)?
Prof. Geert Molenberghs
Yeah! Well that’s perhaps an even more difficult question than the previous one because there are several procedures that are really close to my research area and I would almost say close to my heart. There is of course the various longitudinal procedures like PROC MIXED, PROC NLMIX, PROC GEE, PROC GENMOD; We can go on for a while and then there are the missing data procedures in particular PROC MI and PROC IM ANALYZE. Now that’s not an answer to a question because I gave too many procedures. But let me single out PROC MIXED in SAS because it’s actually the one that we were using when our – not our 2000 book even but the book in 1997 in the election notice statistics series of the Springer was published. That book came into being because we thought a short course in our own university on longitudinal data using the then very brand-new procedure MIX. So it was foundational in that sense and all the rest followed. So if you want a singular rather than a plural answer, I would say the mixed procedure.
For example, in the more recent book that came out added up by Alexandria Dmitrienko and Gary Cook in which I was happy to get in and gave a look on the chapter on missing data in clinical trials. That was center stage for the mixed procedures and the GLIMMIX etc. the longitudinal procedures that I mentioned but in particular I think nowadays we can do so many more things when it comes to analyzing incomplete data longitudinal or otherwise thanks to the development of a few tools but, in particular, the new procedure GEE which allows for weighted estimate equations along with standard estimates. I think that was a great achievement. Also PROC MI , the procedure has been around for quite a while but it’s grown incredibly flexible are almost all implemented including the full conditional specification method. But as the final point in that procedure, I would like to mention the tools that are now available with this sensitivity analysis around missing data MNAR statement indebt procedure. So without going too technical this has really made sensitivity analysis which until recently was recommended but you had to rely on users fin software’s etc. it has become an almost routine, or is becoming more routine they used to be a way of analyzing data just because of the availability of that procedure. So the ones that I added are GEE and MI.
Any closing words?
I’ve always valued the way that SAS as a company and the developers etc. have maintained links, communication links or expert advice links or whatever you call them with the user community. So it’s very important of course. I think it’s a mutual benefit. It’s in the office really for us as users but also as authors of books it’s very important that the procedures that we use and that we include as illustrations in our articles are state of the art. I think for the company it’s useful to also know where the research community is heading and of course it’s heading in many different directions at the same time. But that dialogue which happens formally and especially also informally is of the utmost importance.
I think that’s really a strength and an asset the way SAS is developing and the community too.
Thank you so much for sharing that with us Prof. Geert Molenberghs.