Feature: Chemical Engineering
John MacGregor uses multivariate models to improve the bottom line for some of the world’s biggest chemical companies.
By Tyler Irving
Sophisticated online sensors and increasingly efficient computer storage banks have given chemical engineers access to more data about their processes than ever before. But more data does not necessarily equate to more knowledge, and few people understand that as well as John MacGregor. MacGregor, professor emeritus at McMaster University, has dedicated his career to making sense of the large and messy data sets generated by industrial processes. As president of ProSensus Inc., he has helped Fortune 500 companies reduce costs, increase yields, and improve the quality of products from polymers to potato chips. ACCN spoke with MacGregor to find out how engineers can make the most of industrial process data.
ACCN You have said that learning from data is “the engineer’s Achilles heel.” Why?
JM During their undergraduate programs, scientists and engineers are taught very simple statistical methods, aimed at analysing a small number of variables in a designed experiment, and where it is assumed that all process variables are independent. This is very different from the data we collect on industrial processes. Those data sets are huge, with hundreds or thousands of variables from various points in the system: temperatures, pressures, flow measurements and so on. Up to 20 per cent of the data is missing. Most importantly, this data is not independent: when certain things happen in the process, many variables move together. What that says is that the system is really moving in a much smaller space, maybe only five or six variables in dimension.
ACCN These are the latent variables?
JM Yes. Latent variables are just linear combinations of the original hundreds of variables which define the low dimensional space in which the process moves. They are the underlying or hidden variables that characterize the process. In order to get at these latent variables, you need multivariate statistical methods. That means modelling not only the properties of the final product, but the process variables themselves. It may seem strange to model the inputs, but if you don’t, you can’t handle missing data, and you don’t get a unique model which you can use for optimization or control. That’s why it is the engineer’s Achilles heel; he’s never been prepared to handle this type of data, and yet these are the very processes that he has to deal with in his job. Modelling the process variables and the product variables (the x and y space) separately is not a totally new idea; the original concept goes back to the introduction of principal component analysis by a statistician called Pearson back in 1900. But without computers and big database systems, it wasn’t really possible to take advantage of these tools in order to predict and control the process.
ACCN What was the field like when you first started working in it?
JM Until the late 1960s and early 1970s, process data was just recorded with analog instruments; databases really didn’t exist. Even after computers came in and people began to store the data, there was no ability to extract, for example, a group of 30 or 40 variables over a given time period. Even until the early 1990s, you would have needed someone to write special software.
The main objective of databases was to display the data to the operators, not to analyse it. I came to McMaster in 1972 after working with Monsanto in Texas, and I accepted the position with the intention of doing a fair amount of consulting. I felt it was important to play research, teaching and industrial consulting off each other. So I would talk to the big petrochemical companies like ExxonMobil, British Petroleum, Shell and DuPont. I’d ask “How much money have you made off your databases this year?” and they’d just have a gentle laugh. By the mid 1980s, they stopped laughing. They realized that if they had spent this much money on databases, they should do something with the data they were collecting.
ACCN What happened next?
JM We started getting some grants from companies. At that time, very few academic researchers had grants from industry, and the grants we got were very small. Eventually, we put together something called the McMaster Advanced Control Consortium (MACC). We had six sponsors, big petrochemical companies like Shell, Dupont and Suncor. That eventually expanded to almost 20 big international companies, and the consortium is still running today, and greatly improving the operations of the plants around the world.
ACCN One of your major innovations has been in multivariate modelling of digital images; how did that come about?
JM In the consortium we had a number of companies that made solid products, whether they were pulp and paper companies like Tembec, steel companies like Dofasco or food companies like Frito-Lay. It’s not easy to stick a thermocouple into a potato chip. In the late 1980s we started using colour digital cameras to extract information on product quality. Imaging companies of the time were mostly using black- and-white cameras to simply monitor the process, as operators do now. We looked at multi-spectral images and realized that instead of treating them as images, we could think of them as a source of data. We could use this data to extract information, just like we would use thermocouples or flow meters. Best of all, a good, robust industrial camera costs only a few thousand dollars; if the public didn’t use these things and you had to buy them as an industrial instrument, they would perhaps cost half a million dollars.
ACCN So they’re the same as the digital cameras that everyone is familiar with now?
JM Well, some are line-scan cameras that just capture the image at multiple wavelengths along one line as the material passes underneath. But we can also use an area scan, more like a traditional camera. Many companies around the world do image analysis, but very few of them get into sophisticated analysis in the multi-spectral range. To do that you have to know how to take megapixels of data every second and extract useable information, which means you need multivariate methods. So our past experience, combined with the new technology, enabled us to help companies that couldn’t previously get this kind of online data.
By analysing the spectral data in digital images of corn chips (top row) ProSensus Inc. is able to develop computer models that accurately predict seasoning levels (bottom row) and correlate them to particular process conditions that can be controlled. Systems like this are used for on-line feedback control in snack food plants around the world.
ACCN Can you give some examples?
JM Frito-Lay was one of our member companies in the consortium, so we started imaging snack food products such as Doritos, Cheetos and Tostitos as they passed by on moving belts. We used the cameras to extract estimates of the distribution of the seasoning applied to the chips and several other organoleptic properties such as texture. They were astounded at the information we were able to extract from the images. Within a couple of years, we had systems online in many of their plants in North America. They actually control parts of the plant operation off the cameras. Another project was with Dupont Canada. At their site in Maitland, Ont., they had a waste boiler to generate steam by burning many of their waste liquid streams. They had a camera which was used by the operators simply to ensure the flame was still lit. We decided to digitalize that flame image. They had a hard time believing we could do anything with that data, because the flame was bouncing all over the place. But we showed that regardless of the turbulence, the spectral data gave a consistent picture of the energy content in the feed stream, as well as what pollutants were going out the stack. So we could use multivariate analysis to predict and control that. We’ve used that same technique with Irving Pulp and Paper in monitoring lime kilns, and with steel companies in their oxygen furnaces.
ACCN In 2004, you launched ProSensus Inc. Why was it the right time to do that?
JM I had thought about doing it before, but it was mainly in order to spin off the multivariate image analysis. One of my PhD students was graduating and wanted to continue working in this area. However, there was no company that really understood it well enough to continue. So we spun it off as a company, to open up job opportunities for graduates from our program, and to be able to advance these technologies further with member companies from the consortium.
We quickly found that some companies weren’t quite ready to make a leap into the advanced imaging work. So we went back to multivariate methods for extracting data for other purposes. One area that we got into was rapid product development — how companies use all the data they’ve got on raw materials, formulations, processes and quality control to develop new products with desired properties.
ACCN Such as?
JM An example of early success in this field was a project on advanced polymeric materials, which we did in partnership with Mitsubishi. Some of these materials were for medical devices, while others were for specialized applications like golf balls. Golf balls contain multiple rubber cores to control the distance, spin and about a dozen other properties. We built statistical models that told them to use formulations containing raw materials that they had never used before, but that we predicted would have the desired properties. They tried it, and met the specifications almost right off the bat. Our methodology was used to develop all the core functional polymers in the Srixon golf ball; about 10 per cent of the world's touring pros now use that ball. A company like Mitsubishi would typically take two to four years to develop new products, but with our methods we can get that down to a couple of months.
Multivariate statistical methods can be applied to databases of raw materials, previous product formulations and processing conditions to design new products, such as high- performance cores for golf balls.
ACCN Should we approach statistics differently than we currently do?
JM It’s coming, but very slowly. I think part of the problem is that there are very few people trained in these multivariate methods. That's why ProSensus developed multivariate software, which we make available for free to the universities. McMaster has introduced some of this into the undergraduate courses, but most universities are using it at the graduate level. American universities are behind the Canadian ones; very few of the American engineering schools have statistics as a required course, even for undergraduates. A lot of that is because they are really gearing their students up for graduate work, not for dealing with industrial data or everyday problems. So while many companies do use this stuff, engineers more or less have to learn it after they graduate.
ACCN What has kept you motivated about multivariate techniques all this time?
JM For me, the interest — and the reason I formed ProSensus even as I was getting close to retirement — was in seeing this stuff through and having it applied in industry at a more rapid pace. It’s exciting to do this research, but if you just publish a paper on it, and it never gets used, it’s not very satisfying. We’re taking it beyond the published literature, developing new products in a fraction of the time it used to take, and creating control systems to do things that nobody’s ever done before. It has taken off extremely well, and that’s been extremely satisfying to see.
Want to share your thoughts on this article? Write to us at email@example.com