An Introduction to Technical Language Processing: Unlocking Maintenance Knowledge

Maintenance is an important component of any successful manufacturing facility. Manufacturers are constantly looking for new ways to improve their maintenance procedures. Most research efforts in maintenance have traditionally focused on improving predictive maintenance capabilities by using Artificial Intelligence (AI) or similar techniques on sensor data. Recently, researchers have started to analyze other data sources, such as text-based data. Text-based documents account for a significant portion of the data collected during the life cycle of an asset. These documents contain important information about the asset and the maintenance history, which previously has been untapped for analysis. Traditional Natural Language Processing (NLP) solutions need re-imagining to understand and meet the requirements for these datasets. This presentation presents a methodology for adapting NLP to technical engineering text-based data, a methodology called Technical Language Processing (TLP). An example of TLP is provided through the discussion of an open-source, free toolkit for maintenance analysis, called Nestor.

Transcript:

Michael Brundage:

Hello everyone. Welcome to My Talk, an introduction to technical language processing, unlocking maintenance knowledge. I'm Michael Brundage, and I'm here from the National Institute of Standards in Technology, NIST. I'm the associate program manager for the model based enterprise program and I'm also the project leader of the knowledge extraction and application for manufactured operation project.

Michael Brundage:

Before I get started, one quick disclaimer. The use of any products described in any presentation does now imply recommendation or endorsement but the National Institute of Standards in Technology, nor does it imply that products are necessarily the best available for the purpose. So what is the problem we're trying to solve? Well, maintenance is expensive. In manufacturing alone, $50 billion were spent in 2016 on maintenance.

Michael Brundage:

Maintenance is also very expertise driven. Smart manufacturing technologies can help reduce the cost of maintenance within manufacturing, however, small or medium sized enterprises, SMEs, are still not employing these technologies and why? Well, one, there's a high cost to implement. Also, the risk is very high if there's an incorrect implementation. There's also a lack of support in expertise and manufacturing for these specific technologies.

Michael Brundage:

This often leads to a lack of high quality sensor data. Without this data, it is frequently very, very hard to improve maintenance work processes. But there is an untapped source of data that could be used. These are natural language documents generated by humans. There are a lot of different types within manufacturing, but today we're going to talk about maintenance work orders, MWOs. These MWOs are a health history of an asset. It contains historical tasset knowledge from your maintenance technicians.

Michael Brundage:

However, there are domain specific abbreviations. There's jargon and it's often very unstructured. So current out of the box natural language processing solutions do not always work. So what is the current paradigm with these maintenance work orders? Why are they this way, and why are they so hard for natural language processing solutions to work through out of the box? Well, sometimes you may have physical work orders that are actually handwritten. Sometimes these handwritten work orders are put in things like Excel where they are then stored and frequently not analyzed.

Michael Brundage:

There's also proprietary solutions, computerized maintenance management systems, that will also collect this data. However, even with all these different solutions, there's still a lot of similarities in different types of data and what is actually written down. Sometimes people may write down a lot of information. Others, not as much. But there's almost always value in what is written to understand the history of the asset.

Michael Brundage:

The technicians are writing a lot of information that can be used to analyze and improve the maintenance process. So the question now becomes how much value is there. Does it matter and is it worth our time to really analyze this data? Well, we first looked through three months of maintenance work orders, about 800 data points from a manufacturer in the United States. This manufacturer was an automotive supplier for a large automotive company.

Michael Brundage:

We looked at the raw data and we wanted to find out what were the commonly listed observations. So as you can see here, the number one issue that they wrote was accumulator check requested. This took about 16 hours of time to complete. We then went through and we cleaned the data. We looked and saw, okay, what was the actual problem that was written. Sometimes these things were written in different ways. When we did that, we noticed there was an uptick of about 12 more work orders with accumulator check requests, but it more than doubled the amount of time that was spent.

Michael Brundage:

What was more interesting was the number one issue in the clean data, hydrologically. This doesn't show up at all in the raw data. Why is that? That is because it was written 39 different ways. The plant did not realize that the hydrolic leads were their number one issue. It was the number one problem for the three month span. They could not understand the data themselves when analyzing, so we went back and said how do we develop a solution that could help them analyze this data and further understand their facility.

Michael Brundage:

So first it's important to understand how we went through and cleaned this data the first time. We looked at this data, the raw data which involved a description and a resolution from the plant. We decided that it was important to look and see what was the cause, what was the effect, and what were the solutions implemented for each specific work order. Now, to do this by hand, you have to go one at a time.

Michael Brundage:

So as you can imagine, this takes a long time. For the first work order here, the effect that we determined was hydrolic [inaudible 00:05:16] attachment. The solution was replace the seal and they repair it. Now doing this one at a time is very tedious, very boring, and doesn't really give you a lot of bang for your buck. In fact, it took 12 hours for 800 maintenance work orders. This is not scalable.

Michael Brundage:

As we said earlier, NLP techniques do not always adapt well to engineering techs. However, the tools they provide can be powerful in this fight, and how do we analyze this data. But they need to be adapted correctly. That's what's important. So enter technical language processing, TLP. We recently wrote a paper around this in manufacturing letters, as you can see below. TLP is a methodology to tailor NLP solutions to engineering techs and industry use cases in a scalable and reproducible way, to help you analyze this data.

Michael Brundage:

So what does this look like in practice? I'm going to show you a methodology called ranked tagging, which is done with a tool called Nestor that we developed at NIST. So if you look at the original work orders, which are exactly the same as what I showed before, instead of going through and determining one at a time, what if we took a step back and said, what are the words that are used throughout all of the work orders and rank them in terms of what is more important.

Michael Brundage:

Instead of having to go one at a time, we can now look word by word. What if we took those words and started to classify them, start figuring out what is their type, so we could start thinking is this a solution action. Things like replace or repair. Is this a problem action? Things like leak or bad, or is this an item. Things like gage or hydrolic. Once we do that, we can start building out a thesaurus linking these words together. But as part of that, we also need to include things like misspellings, jargon, and also those abbreviations that we mentioned earlier.

Michael Brundage:

If we take the word repaired as an example, we can see on the right side we can start linking things to those abbreviations, misspellings and jargon, things like repears. We know that's a misspelling of repairs or repair, and we can link those together. But we may not be sure a word like rep. Maybe that's replaced, maybe that's repaired. We can't be sure until we have more context, and Nestor can provide that.

Michael Brundage:

However, this time we may say we don't want to link those things together. Once we link those misspellings, abbreviations and jargon, we can also classify. So for repaired, we know it's a solution. We can do that for every other word within the work order. You can see that things like repaired and repeared are now linked through one alias of repair, which is picked by a user, but you can also see that the user said, "I don't like the word bad. I want that to mean broken." Or, "I don't like the word hyd. I want that to mean hydrolic."

Michael Brundage:

So you may be asking yourself at this point. That's great, but how many misspellings are there? How many abbreviations? How much jargon is there really within my corpus? Honestly, there's a lot. What we're going to show here is these are the raw words that are used within the NWOs. Things like replacer there, which is normal, but you also have things like PM, which is an abbreviation, or hyd, which is also an abbreviation. Further down, you have other misspellings and the like.

Michael Brundage:

Now what we could do here is we insert the tags, what we just did in the previous step. Those linkages between the different concepts, the different usages of the word, the abbreviations, the misspellings, the jargon. What if we zoom in on one specific example, leak, and show what that looks like with all the different types of usages within one dataset. So what we see here is the word leak spelled correctly and, in this tense, was used 16,286 times. What you see with the gray are the concepts that are equivalent to leak. They were used 9,984 times.

Michael Brundage:

What are those things that mean the same thing as leak, but may be misspelled or may be a different usage, or maybe a different tense, abbreviation or jargon? Well, leaking is 8,293 times. Leaks is 1478 times. Then you also have leakage, leaking, leak and lake 211 times. All of those usages together are 40% of the concept of leak. So if you only search for the word leak spelled L-E-A-K, you will only get 60% of all the problems that involve leak within your facility.

Michael Brundage:

Once we go through and link those words to their classifications, things like problem item, solution, and also create an alias which links those concepts together with their abbreviations, their jargon, their misspellings, their different usages of tense, how does that go back to the work order? Well, now we can use a computer to go through and tag each individual work order with the thesaurus that we developed behind the scenes and is much, much faster.

Michael Brundage:

So now a computer can go through and see that hyd and hydrolic are linked, and the item is hydrolic. We can also see and do this for other things as well, like replaced, and complete the entire work order corpus with what we just had. What we showed before was only for item problem and solution. You may be asking yourself, well how can we understand what items have what problems, or what items have what solutions. Now we can start developing rules that will link these things together.

Michael Brundage:

So, for example, if we look at cut off and unit, any time these occur next to each other, we can say an item plus an item is still an item, but it's a new type of item, so it's a cut off unit. We can also link things like problem and item and say this is a problem item pair. In a lot of computerized maintenance management softwares, these are things like problem codes. We could also have the equivalent for solution, so we can look at solution and items together.

Michael Brundage:

So if I want to know how frequently do I replace my fitting, all I need to know is did they occur together. Now we also may come up with things like solution but problem. Whoever, this might not be necessary because we already capture that data when we link replace and solution and we link problem and missing. We don't need to know they're together because they will already be annotated in the work order set.

Michael Brundage:

This also helps us if we are unsure of a single word. So for example, the word hot. I may not know by itself if this word is an item, or it may be a problem. So if we look on the right side when those words are linked with other words, too hot or hot water, all of the sudden we can start saying, well too hot is a problem, but hot water is an item and it is something that occurs very frequently in things like heating, ventilation and air conditioning, HVAC, where hot water pipes, hot water means it is an item.

Michael Brundage:

So this is very important to add context when you're tagging this data. Conceptually, this is what it looks like. We now can see all of the items in blue, all of the problems in red, and all of the solutions in gray, and we can start linking them together and getting more information about or work order. You may be asking yourself, well this is going to take too long, but in our experiment it took 45 minutes to go through those words, classify them, create an alias and link them, and 3100 work orders were completely tagged out of 5600 total.

Michael Brundage:

What's even more interesting is the way that we go about it. 54 work orders were partially tagged, meaning they had at least one item, one problem or one solution. They may not have everything, but they have more information than they would have had had they not been clean. That's amazing.

Michael Brundage:

So what does our method look like compared to other methods? Well, like I showed you earlier, manual cleaning takes [inaudible 00:14:47] 12 hours, 800 labels, and honestly they're not always that great. I was the one that went through and did it and I can tell you, by the end, it is very hard to be consistent. Did I put hydrolic leak? Did I put leaking hydrolic? Maybe you don't remember. Also, it's a lot of cognitive load for someone. Is this a cause? Is this effect? It's very difficult to understand just from reading that information.

Michael Brundage:

We can also take that data and use it as training data and input it into a machine learning model. Now, when we did it, we didn't have enough labels. 800 is not enough even for that 6600 work order set. It just will not learn enough from it? However, we have worked with other collaborators which have done this at a much higher scale. But you need a lot of labels. They have in the order millions which takes a lot of time and resources.

Michael Brundage:

The other thing you could do is create a rules basement, but this takes months. So of those 5600 work orders that we were describing before, 5485 were completely classified, but it took months to develop this system. We could also use a method we developed manually. Problem item solution go through one at a time and determine for each work order what are the words that are associated with them. That takes still about 12 hours for 1200 tech. Or we could do the method that I really went into detail around with Nestor.

Michael Brundage:

For one hour or so, 3100 were tagged. 5400 were partially tagged, and it honestly does not take that much work. Now that we've shown how easy it is to tag with Nestor, what can you do with it? Well, we've done a variety of case studies with different datasets across the years. This one here is looking at machine performance. We wanted to find out what was the time between failure for each machine. This company did not have robust data when the machines were down.

Michael Brundage:

So we looked at the work order. We wanted to see the time between the word broken and replace, and we can start getting an estimate of the time between failure. Also, the survival curve so we can start estimating when it may go down again. But we also can go a step further. Once we know those mean time to failure, we can also understand what went wrong. So for the example, in H34, we may see that replaced in unit are the number one things. We also may see that I19 has a lot of alarms and there are things related to sensors that are constantly brought up.

Michael Brundage:

H14 has a lot of operator issues, so is there a problem with the operator at this machine? We also can see in the right there's different ways we can visualize it. We can cluster things together and see how they're interrelated so maybe we can come up with different ways to fix things.

Michael Brundage:

The other thing that's great about this method is that it's cross domain. We work with our HVAC company on campus to analyze those two hot issues that I mentioned earlier. They frequently come up in our work orders because people on campus are constantly complaining it's too hot or too cold in their office, and people have to go out and fix it. So we wanted to track over time how frequently did these occur and when.

Michael Brundage:

But we also wanted to see where did they occur. Were there any problem buildings on our campus, as you see here on the left, where there's a lot of people complaining. You may think to yourself, well this is pretty simple, why do you even need to do this? However, they send someone out every time there's a complaint. So if we can start lumping those together on times of the year when it is too hot or too cold outside, when we know these complaints are going to come in frequently, we may be able to save money on scheduling. We may be able to make it more efficient.

Michael Brundage:

Now that I've shown the value of tagging your data, cleaning it up, and also what you can do with it, you may be wondering how can I do this myself. As I mentioned earlier, NIST developed an open source free tool called Nestor, which goes through the methodology I was describing earlier. If you look at this page here on the bottom, you can find the software. If you have questions, please do not hesitate to reach out to us. The code is completely opensource on GitHub, so if you want and have developers, they can look through it themselves.

Michael Brundage:

The link in the previous slide will bring you to this page. As you can see here we have the source code, as I mentioned earlier. There's also links to our publications which go into more detail around Nestor and the method behind it. There's also a link to a dataset. The 5600 work orders I was mentioning earlier. This dataset is publicly available, but it's from the mining industry. Now I mentioned there are a lot of similarities between different domains, however we are looking for more rich opensource datasets that we can publish to help researchers like ourselves and academics develop new and better solutions.

Michael Brundage:

The entire Nestor process is an example of technical language processing. As I mentioned earlier, we just published a paper in Manufacturing Letters detailing this process. This is a figure from that paper. As you can see, you take in raw text and use cases in box A and move them into this process. However, we also do use NLP resources. I don't want you to go home thinking that NLP is not important to analyze this data. It is.

Michael Brundage:

However, as I mentioned, we need to adapt it correctly to our engineering use cases. This process involved both a domain expert and an analyst, and it uses community developed TLP resources to help these people analyze the data. It's a constant feedback loop. It's important to constantly be asking the question of how do I improve this process and how do I make it better.

Michael Brundage:

So what comes next. Well, we just developed a technical language processing community of interest and we're having our first meeting the week of April 12th through the 16th at 4:00 to 6:00PM eastern time. If any of you are involved in the MBE Summit, you may notice this is the same week. We are collocated with that conference. We are not overlapping with any other material.

Michael Brundage:

Our goal is to still continue to pilot Nestor within industry. We've worked years to develop the software and we've worked with companies to make it better, and we want to continue to do so. We are also developing standard guidelines through the ASME prognostics and health management sub committee. I hope you can come join us at our first meeting in April. Our first day will encompass the value of TLP. We will have more industry participants discussing how TLP has been valuable to them.

Michael Brundage:

On day two, on Tuesday, we will talk about more tools in TLP, one of those being Nestor. On Wednesday, we will talk about the need for TLP datasets. We will have various companies describing why they've provided data to data competitions in the past and how it's helped them. On Thursday, we'll talk about creating the necessary resources. As I just mentioned, the community must come together to work on these resources to improve the TLP process.

Michael Brundage:

Last, on Friday, we will talk about next steps. What do we want this community of interest to be and how do we want it to move forward. Thank you again for attending my presentation. I welcome any questions. If we do not get to those questions now, please do not hesitate to email me, and you can see my email here. As I said, our first meeting of the TLP COI is April 12-16. Please come attend.

Michael Brundage:

Last but not least, if you want to work with Nestor, as I mentioned earlier, here is the link. Thank you again and I welcome any questions.

Stephen LaMarca:

Thanks for that presentation Michael. That's actually really cool. It was fascinating learning about how it can pick up on just grammatical errors and spelling errors even. What's also kind of scary about that is what if a technician doesn't use a technically accepted word? I don't mean like foul language, but I mean what if, instead of broken or replaced, a technician writes in their notes, oh yeah the spindle was totally caput?

Michael Brundage:

Right.

Stephen LaMarca:

Does Nestor know to pick up on jargon like that?

Michael Brundage:

So that's one of the reasons we want a human in the loop for these systems because Nestor, or even just traditional NLP software and tools, wouldn't. But if we could present the information in such a way that we ... so either an expert, like a maintenance technician or a maintenance manager, they can start linking those terms. Now, if it doesn't come up and the manager sees caput, they can maybe go and go back to the original work order and see that maybe you use that phrase, and now you can start linking that to broken.

Stephen LaMarca:

Gotcha. So just like when we were talking with Josh and his robotics vision systems, there is a lot of human involvement required to zero in on the efficiency of this technology.

Michael Brundage:

Right. What we wanted to make sure was the limit the amount that we wanted to human and use them for the right tasks. As I said in the presentation, I was the one that went through and manually annotated and I don't know if you've ever cleaned data before, but it's awful.

Stephen LaMarca:

It is a nightmare.

Michael Brundage:

Yeah. Like I was mentioning, all those consistency mistakes. I have a person on my team that constantly makes fun of me. He's like, "You know your data cleaning was not that great in the long run." I was like, "I know, but we learn from it, so that's all that matters." But instead of doing that, we're really asking kind of yes or no questions. Is this linked to this and what is the classification of this word? That's not a yes or no question, but it's a pretty simple question to answer as opposed to going through an entire work order and saying, what was the cause, what was the effect, what was the solution.

Stephen LaMarca:

Right.

Thomas Feldhausen:

So you brought up an interesting thing, Steve. So let's say with spindles. There's a lot of people that do spindle analytics. They look at sensor data. I think there's a lot of value with this text based data where you don't have this steep learning curve. Do you see, looking three to five years down the road, Michael, is integrating this with some of the system analytics, NT connect data, is that beneficial?

Michael Brundage:

Absolutely. That's something we're actually investigating. We've done some work through cooperative agreement with tech solve where we've generated an experiment where we're generating that data and then also generating on the side observations and work order type of data and looking into how to analyze that. We don't think that the work orders are going to outright replace sensor data. That would be insane.

Michael Brundage:

One, it will help inform where to place sensors if you don't have that data. Two, I think it will also give you kind of validation. One of the things we've looked into is, if let's say something breaks on your spindle, you want to go back and say, oh well the work order was initiated at this time, so it kind of limits your timeframe maybe when breakage happened or when it was replaced or repaired, whatever.

Thomas Feldhausen:

Yeah. No, I think you're completely right. Algorithms like this and this type of workflow can really help mitigate a lot of the steep learning curves with looking at processing data.

Stephen LaMarca:

This next question of mine, I know I should probably close our Q&A session with it, but I kind of want to get to it right away. As your presentation concluded, I thought to myself, I probably don't have the best hands on manufacturing experience, especially with work in maintenance orders to have any say in the development of this, but how can somebody like myself, or somebody who's actually in the weeds with this, get started in using?

Michael Brundage:

So there's a number of ways. I think, one, we welcome those kind of perspectives because we want to get that buy in. Even if you're not involved necessarily in the maintenance process, if you're an expert with the machine or whatever it may be, we want to talk to you. The number one thing a lot of companies tell us when we get started was, well why don't we just force the technicians to use common language or a standardized language.

Michael Brundage:

One of the big automotives told us they tried that and that there was almost a revolt because it's just really difficult. You're not paying these technicians to write down good data. You're paying them to get their job done and move on. The data is kind of a subset of that. So I think what we want to do is make sure these tools work with those people and are not really forcing them to do extra work that is tedious. It's really maybe a quick ... The previous talk was about doing that kind of learning in the beginning, and then you don't have to keep doing it.

Michael Brundage:

Same thing with this. Once you link the caput to the broken, you don't have to do it again. You're already done.

Stephen LaMarca:

So how does somebody like myself get started with Nestor? Would that be ...

Michael Brundage:

So I mean-

Stephen LaMarca:

Okay, go ahead.

Michael Brundage:

Go ahead. Sorry.

Stephen LaMarca:

I was going to say would that be going to the NIST website and downloading whatever software is available for Nestor or attending the first meeting coming up soon?

Michael Brundage:

So I would argue both, but I'm a little bit biased. So we have an executable for Windows users, but we also provide for Mac as well. [inaudible 00:29:08] not executable, but a package that you can download and run it right away. Then we're continually developing and putting out the source codes. So if you're not necessarily computer ... Or sorry, if you are a computer scientist, you could go through and see. We welcome any kind of feedback on that as well.

Michael Brundage:

Then, at the meeting, one of the days, as I mentioned, will be around tools in TLP. So Nestor is only one tool. We'll actually have five different tools that will be presented that do similar types of things for annotating this type of data.

Stephen LaMarca:

Gotcha. All right, we actually have a question in from Dr. Pavel. He says, "Good presentation and description. This seems to be one of the first such efforts for manufacturing. Are you aware of any other similar efforts or tools in the US focused on annotation and TLP for manufacturing?

Michael Brundage:

So coming straight out of manufacturing, I think there's a couple. I think one of the presenters that will be at our workshop is called the Company Red Shred, and they're actually doing something similar. We also have ... even though it's not manufacturing, I mentioned that this is pretty cross domain, so there's another collaborator we use out of University of Western Australia that has a tool called Red Coat that also does something similar.

Michael Brundage:

They're not reproducing exactly the same workflows, but they're all trying to help you annotate your data better. Also, we did bring in for the workshop someone from NIH because, if you'd imagine, when you have doctors notes and things of that nature, it's pretty close to what we're doing in maintenance. It's obviously a lot more data and a lot more private, but we've brought someone in that has been developing tools in that space as well because we think it's useful to learn from that community. They're way further ahead I think than we are in this space and we can try to adapt what they've done and improve upon it.

Stephen LaMarca:

I would assume that the NIH would like some sort of software technology that can help read doctor's notes or even understand signatures from doctors.

Thomas Feldhausen:

Yep. So building on that, we're all very interested in manufacturing and slightly biased towards it, but in your opinion Mike, what is the most interesting industry that this could be applicable towards?

Michael Brundage:

I'm also biased towards manufacturing, so my dissertation was all manufacturing. I'm not an NLP expert. I don't want to claim I am. I did a lot of manufacturing systems research. I think it's interesting to manufacturing because we are starting to ... We've been really great at starting to collect that data, and I think more and more people want to collect more data. I think they have these work orders and stuff sitting there and they're not using them.

Michael Brundage:

The first question that kind of went to this work was a company asked us how do I get smart. It's such a loaded question and also an unanswerable question because it's like, how do you even start with this. So this started because they sent us these work orders and I was naïve and thought, oh this will be great. They'll be really consistent and easy to parse through, and then spent months, if not years, developing this because we saw there was a need in this space.

Michael Brundage:

So I think manufacturing and engineering in general ... I think the medical industry is close, but it's not the same type of people. There may be, but I think it's finding that common ground, but I think it's really adapting these solutions to engineering techs is interesting to me.

Stephen LaMarca:

Oh man. All right, let's see. We don't have any more questions, but Michael, I wish I had more for you because this really is fascinating. I'm sure it can be, as Thomas said, it can be applied to other applications outside of manufacturing, because it certainly doesn't seem dedicated to manufacturing. I feel like there's a lot of people in software development and IT that would probably love to have something like this as well.

Stephen LaMarca:

Do you see it branching out in the future, in the near future?

Michael Brundage:

I hope so. I think that's one of the reasons we provide the code. We're NIST. We're a large organization, but it's a small team working on this, and that's why we need it to work with industry. I also think it will branch out hopefully more than just maintenance use cases. There are so many natural language documents even in manufacturing alone. Being able to figure out what's the best way to parse through that, because Nestor doesn't work for everything.

Michael Brundage:

You have long form text, it can work, but it's not built for that. So that's what this community of interest is for is to figure out what are those use cases, I think I just talked about earlier. Don't just collect data to collect data. Understand what your use case is up front and then figure out what the solution is to get the result you want. That's what we really want to be able to do is link those use cases and solutions you want to the right NLP and TLP technologies that people can get reproducible results.

Stephen LaMarca:

Do you see ... Well, as different people use different terminologies as different jargon, does Nestor actually collect and compile each of the terms used and start developing libraries of synonyms for a thesaurus, if you would? Is that list of definition and glossary of terms, if you would, is that accessible to all users?

Michael Brundage:

So that's not something Nestor does out of the box, but that is something we're investigating. We're working with a lot of ontologist to try to see if we can use this for a data driven method to help develop those types of ontologies. The other thing we've investigated is, let's say I give you Nestor and you go through a work order set and you start tagging, you may link words differently than me and differently than Thomas.

Michael Brundage:

So we have also done some research in well how do we resolve that. If the three of us have different classifications or different linkages, who is right? Is anyone right? Does it matter? It really may depend on the analyst or what you're looking to get out of it. So we don't have it where it exports it for you, but we are doing research in that space.

Thomas Feldhausen:

Have you found ... oh go ahead, Steve.

Stephen LaMarca:

No, you go ahead, Thomas.

Thomas Feldhausen:

I was going to ask, have you found an instance where Nestor has become proactive? So let's say you have this work cell that has a lot of caputs a month. Have you seen to the point where you see one work order and they you say maybe I need to go replace this before it happens again?

Michael Brundage:

So that's a good question. There's I guess two parts of that question. If you start having caput appear often, it will actually drive to the TLP. So you would start annotating that quicker. From the analysis side, that is something we're looking into. I mentioned the HVAC work we're doing on our campus. It's not just the people that go out and do the maintenance. We have researchers in a different division that are interested in how do you push HVAC research forward.

Michael Brundage:

As part of that, they have a fault detection mechanism, so we're using the work orders and seeing what are the machines that have failed frequently and what will give them the rich data they need to do their fault detection. So it's not the same exact example, but it is something we're looking into to being proactive so they can use their technology to predict exactly what you mentioned.

Stephen LaMarca:

I'll have to ask Rus if caput is going to be added to the MT Connect's list of definitions. Thinking about that, I've been exposed to a lot with MT Connect. MT Connect, it's based around a list, a compiled list of definitions with respect to manufacturing. I feel like a technology like this, a software like Nestor where it could seemingly break down the need for a list of definitions and make a list of definitions, not useless, but it doesn't need to regard them anymore.

Stephen LaMarca:

But at the same time, I feel like Nestor is also breaking down traditional definitions just to rebuild them back up.

Michael Brundage:

Yeah. In general, we've tried to approach that problem. Again, Nestor is not necessarily the only way. I'm part of a standards committee, as I mentioned, and I always laugh. The first time we were meeting as a committee, we had a professor that came and said, "Oh, definitions, they'll take like 20 minutes." Everyone that's been part of standards laugh because it's like definitions end up taking you days and days and days. It's awful.

Michael Brundage:

So we have investigated. Nestor, again, doesn't necessarily do this, but other NLP methods to go through and see those commonalities of definitions and try to figure out what are we actually saying here. Nestor could come in behind the scenes, as you mentioned, and see how is this word used in practice, because we may have the same definition, but the word in practice ... I keep going back to caput may mean something very different on dictionary.com than it does in this context.

Stephen LaMarca:

Then you'd have to consider urban dictionary as well. But we're talking about the definitions, but I think you just touched on context and, more importantly, semantics. Does Nestor take that into consideration as well?

Michael Brundage:

Yeah. So one thing I didn't show and I wish I did is, when you hover over a word as an example ... so I used rep as an example in the slide deck. When you however over the word rep, when you pull out actual usages of that word within the documents, then we'll show in the example of the work orders that I had mentioned, it actually is rep\lace, because they had a misspelling. We know it's replace not repair.

Michael Brundage:

So you can provide ... we provide three work orders, when you hover over the word, where that word was used, because we do find that some words are pretty unindigenous. Replace is always a solution. We know what it is 99% of the time. But the hot example, you don't because, in HVAC, they do mean two very different things. Too hot is equally as important as hot water because you may want to know the hot water pump is broken, or you may want to know the water is too hot.

Michael Brundage:

So we provide that context, and we also do it by expanding the number of words we look at at a time. So I showed it for one word, and then maybe we start looking at words on either side of that to see if that phrase matters. So we do it in a variety of ways because we do realize that context is so important in an engineering [inaudible 00:39:46].

Stephen LaMarca:

Yeah. We got another question in from Dr. Pavel. He says, "Looking forward, I'm thinking the computerized maintenance management systems logging these natural language reports may be automated to suggest such annotations or apply them automatically. This would enable improving reports and graphs with actionable information. Do you think this may be a viable possibility for the future? What are some of the challenges you see with the automation of the annotation process?"

Michael Brundage:

I hope so. That's where we got the term tags from. So tags, if you think of Netflix or if you want to find a horror comedy or whatever it is, instead of having just one adaptation ... That's the problem with the cause effect problem model that we ... sorry, cause effect treatment model that we started with is that you start locking into one thing and it's hard. Like is this one cause or one effect? Versus a tag, it's like there are these 10 items, or there are these 10 problems, these 10 solutions.

Michael Brundage:

So if you had a management system that has already seen this word, absolutely. I think that is something we want to happen is that you have ... the technicians write it however they want and then you start adding the meta information as part of it. The challenge being that, one, how does that evolve over time if you lose assets, as an example of they get replaced or whatever it may be, you may lose those words with it. Maybe they're not relevant or they were only relevant to that asset.

Michael Brundage:

People retire the same type of thing because maybe that person that uses caput isn't here, so we need to stop talking about it. I find myself questioning why I'm still talking about caput now even. But you have to worry about these people retiring, but the nice thing is our goal is we don't want to lose it. You want to at least have it forever, but there are challenges that are not necessarily addressed by the research yet and we want to continue to address them.

Thomas Feldhausen:

I can't get over, Michael, there. You gave one example during your presentation where you took ... I think it was either 30 or 50,000 descriptions and you were able to categorize them within an hour, which absolutely seems insane. I get where technicians ... you don't want to give them certain categories and everything must fit into it, but is there value in having kind of subcategories? Does that make things go faster or does it not really matter?

Michael Brundage:

I think it depends. So I think one of our goals is to find the right number. It's hard because, if you have too few, it doesn't mean anything. If I just said classify if this is a word, that doesn't tell you anything. We started with item, problem and solution because we felt that it fit the maintenance domain really well, and that's one of the topics of discussion [inaudible 00:42:22] TLP is what are the number of classification.

Michael Brundage:

We had an independent person, a person we worked with at University of Western Australia, does item action and state. Actually, it kind of gets to the same point. But now, even talking about is it important to have time durations and things like that sort. So it really depends on how many types of classifications and who you want to ask, because it may not always be a technician. You might be able to ask us, who maybe aren't familiar with one specific part of the floor, for those time durations. Go through and just figure out is this week, month, day, whatever it may be, and then you're done. Then you pass it along to the technician for the jargon, abbreviations and the things of that sort.

Michael Brundage:

I just think it's hard to give ... You don't want to give them all of the work because they will see it as an extra task. You want to find the right balance where you're getting the most bang for your buck.

Thomas Feldhausen:

That makes a lot of sense. I don't know about you, Steve, but Michael's answers on my questions, they're caput.

Stephen LaMarca:

Yes, to use it one last time, sure. Mike, it has been a huge pleasure. It's always a treat getting a visit from NIST. So thank you so much for joining us today.

Tagged