An NLP-Based Architecture for the Autocompletion of Partial Domain Models

Burgueño, Loli; Clarisó, Robert; Gérard, Sébastien; Li, Shuai; Cabot, Jordi

doi:10.1007/978-3-030-79382-1_6

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12751))

Included in the following conference series:

International Conference on Advanced Information Systems Engineering

3020 Accesses
29 Citations

Abstract

Domain models capture the key concepts and relationships of a business domain. Typically, domain models are manually defined by software designers in the initial phases of a software development cycle, based on their interactions with the client and their own domain expertise. Given the key role of domain models in the quality of the final system, it is important that they properly reflect the reality of the business.

To facilitate the definition of domain models and improve their quality, we propose to move towards a more assisted domain modeling building process where an NLP-based assistant will provide autocomplete suggestions for the partial model under construction based on the automatic analysis of the textual information available for the project (contextual knowledge) and/or its related business domain (general knowledge). The process will also take into account the feedback collected from the designer’s interaction with the assistant. We have developed a proof-of-concept tool and have performed a preliminary evaluation that shows promising results.

Supported by Spanish project TIN2016-75944-R and CEA’s initiative Modelia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
According to the Cambridge dictionary: “information on many different subjects that you collect gradually, from reading, television, etc., rather than detailed information on subjects that you have studied formally”.
2.
Note that “NLP model” and “domain model” do not refer to the same type of model at all. In the NLP field, a model is the result of analyzing the textual corpus of data (it could be a trained neural network, a statistical model,...). To avoid confusion, in this work, each time we refer to a NLP model, we always refer to it as “NLP model” and never as “model” alone.
3.
https://nlp.stanford.edu/projects/glove/, https://wikipedia2vec.github.io/wikipedia2vec/pretrained/, https://code.google.com/archive/p/word2vec/.
4.
Note that, for each model, there is a finite number of slices.
5.
In linguistics, lemmatization is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word’s lemma, or dictionary form.
6.
https://github.com/modelia/model-autocompletion.
7.
https://www.eclipse.org/modeling/emf/.
8.
These documents are not publicly available due to industrial property right. Nevertheless, the software artifacts derived from them are available in our Git repository.

References

Agt-Rickauer, H., Kutsche, R., Sack, H.: Automated recommendation of related model elements for domain models. In: MODELSWARD 2018, vol. 991, pp. 134–158 (2018)
Google Scholar
Arora, C., Sabetzadeh, M., Briand, L.C., Zimmer, F.: Extracting domain models from natural-language requirements: approach and industrial evaluation. In: MODELS 2016, pp. 250–260 (2016)
Google Scholar
Bakar, N.H., Kasirun, Z.M., Salleh, N.: Feature extraction approaches from natural language requirements for reuse in software product lines: a systematic literature review. J. Syst. Softw. 106, 132–149 (2015)
Google Scholar
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., et al.: Language models are few-shot learners (2020). https://arxiv.org/abs/2005.14165
Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: ESEC-FSE 2009, pp. 213–222 (2009)
Google Scholar
Buitelaar, P., Cimiano, P., Magnini, B.: Ontology learning from text: methods, evaluation and applications, vol. 123. IOS press (2005)
Google Scholar
CEA NLP tech: LIMA: LIbre Multilingual Analyzer. https://github.com/aymara/lima/wiki/DeepLima-beta#the-lima-multilingual-nlp-tool (2020)
Conesa, J., Olivé, A.: A method for pruning ontologies in the development of conceptual schemas of information systems. In: JoDS V, pp. 64–90 (2006)
Google Scholar
Dahab, M.Y., Hassan, H.A., Rafea, A.: TextOntoEx: automatic ontology construction from natural English text. Expert Syst. Appl. 34(2), 1474–1480 (2008)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). http://arxiv.org/abs/1810.04805
Elkamel, A., Gzara, M., Ben-Abdallah, H.: An UML class recommender system for software design. In: AICCSA 2016, pp. 1–8 (2016)
Google Scholar
Evans, E.: Domain-driven design: tackling complexity in the heart of software. Addison-Wesley Professional (2004)
Google Scholar
Fellbaum, C.: WordNet: an electronic lexical database. Bradford Books (1998). https://wordnet.princeton.edu/
Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: CAISE 2011, pp. 482–496 (2011)
Google Scholar
Ganser, A., Lichter, H.: Engineering model recommender foundations. In: MODELSWARD 2013, vol. 19, pp. 135–142 (2013)
Google Scholar
Gasparic, M., Janes, A.: What recommendation systems for software engineering recommend. J. Syst. Softw. 113, 101–113 (2016)
Google Scholar
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. In: LREC 2018 (2018)
Google Scholar
Harel, D., Katz, G., Marelly, R., Marron, A.: Wise computing: toward endowing system development with proactive wisdom. Computer 51(2), 14–26 (2018)
Article Google Scholar
Harmain, H.M., Gaizauskas, R.J.: Cm-builder: a natural language-based case tool for object-oriented analysis. Autom. Softw. Eng. 10, 157–181 (2003)
Article Google Scholar
Ibrahim, M., Ahmad, R.: Class diagram extraction from textual requirements using natural language processing (NLP) techniques. In: ICCRD 2010, pp. 200–204 (2010)
Google Scholar
Kuhn, A.: On recommending meaningful names in source and UML. In: RSSE 2010, pp. 50–51 (2010)
Google Scholar
Kumar, D.D., Sanyal, R.: Static UML model generator from analysis of requirements (SUGAR). In: ASEA 2008, pp. 77–84 (2008)
Google Scholar
Kuschke, T., Mäder, P.: Pattern-based auto-completion of UML modeling activities. In: ASE 2014, pp. 551–556 (2014)
Google Scholar
Lee, C.S., Kao, Y.F., Kuo, Y.H., Wang, M.H.: Automated ontology construction for unstructured text documents. Data Knowl. Eng. 60(3), 547–566 (2007)
Article Google Scholar
Marasoiu, M., Church, L., Blackwell, A.F.: An empirical investigation of code completion usage by professional software developers. In: PPIG 2015, p. 14 (2015)
Google Scholar
Mendix: Mendix assist (2020). https://www.mendix.com/platform/#assist
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, vol. 2 (2013)
Google Scholar
Mussbacher, G., Combemale, B., Kienzle, J., et al.: Opportunities in intelligent modeling assistance. Softw. Syst. Model. 19(5), 1045–1053 (2020)
Google Scholar
Olivé, A.: Conceptual Modeling of Information Systems. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-39390-0
Book MATH Google Scholar
OutSystems: (2020). https://www.outsystems.com/p/low-code-platform/
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP 2014, pp. 1532–1543 (2014)
Google Scholar
Reinhartz-Berger, I., Kemelman, M.: Extracting core requirements for software product lines. Requirements Eng. 25(1), 47–65 (2020)
Article Google Scholar
Robillard, M., Walker, R., Zimmermann, T.: Recommendation systems for software engineering. IEEE Softw. 27(4), 80–86 (2009)
Article Google Scholar
Sagar, V.B.R.V., Abirami, S.: Conceptual modeling of natural language functional requirements. J. Syst. Softw. 88, 25–41 (2014)
Article Google Scholar
Saini, R., Mussbacher, G., Guo, J.L., Kienzle, J.: DoMoBOT: a bot for automated and interactive domain modelling. In: MDE Intelligence 2020, pp. 1–10 (2020)
Google Scholar
Saini, R., Mussbacher, G., Guo, J.L., Kienzle, J.: Towards queryable and traceable domain models. In: RE 2020, pp. 334–339. IEEE (2020)
Google Scholar
Sen, S., Baudry, B., Vangheluwe, H.: Towards domain-specific model editors with automatic model completion. Simulation 86(2), 109–126 (2010)
Article Google Scholar
Shao, T., Chen, H., Chen, W.: Query auto-completion based on word2vec semantic similarity. J. Phys. Conf. Ser. 1004(1), 12–18 (2018)
Google Scholar
Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Framework 2.0., 2nd edn. Addison-Wesley Professional, Boston (2009)
Google Scholar
Wong, W., Liu, W., Bennamoun, M.: Ontology learning from text: a look back and into the future. ACM Comput. Surv. (CSUR) 44(4), 1–36 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Open University of Catalonia, Av. Tibidabo, 39-43, Barcelona, Spain
Loli Burgueño & Robert Clarisó
Institut LIST, CEA, Université Paris-Saclay, Avenue de la Vauve, Palaiseau, France
Loli Burgueño, Sébastien Gérard & Shuai Li
ICREA, Barcelona, Spain
Jordi Cabot

Authors

Loli Burgueño
View author publications
You can also search for this author in PubMed Google Scholar
Robert Clarisó
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Gérard
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Li
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Cabot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loli Burgueño .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, VIC, Australia
Marcello La Rosa
The University of Queensland, St Lucia, QLD, Australia
Shazia Sadiq
Universitat Politècnica de Catalunya, Barcelona, Spain
Ernest Teniente

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burgueño, L., Clarisó, R., Gérard, S., Li, S., Cabot, J. (2021). An NLP-Based Architecture for the Autocompletion of Partial Domain Models. In: La Rosa, M., Sadiq, S., Teniente, E. (eds) Advanced Information Systems Engineering. CAiSE 2021. Lecture Notes in Computer Science(), vol 12751. Springer, Cham. https://doi.org/10.1007/978-3-030-79382-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-79382-1_6
Published: 24 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79381-4
Online ISBN: 978-3-030-79382-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An NLP-Based Architecture for the Autocompletion of Partial Domain Models