Language of Study: German
Subject Major: Computer Science
Country of Study: Germany
Title of Work: The Development of an SQL-Parser for Educational Database Modification
Abstract: The primary goal of all universities should be to improve the quality of the education that they offer. In order to present computer science students with a more interactive database modification experience, a team from the computer science department of TU Dortmund is developing a free, editable database program that can be hosted locally on various OSs. An important component of the program is a custom SQL-Parser that will prevent illegal commands from the users that would cause a failure of the database program. An insight into a parser’s general role in digital applications is a focus of this Bachelor thesis, as is the construction of the specific parser required by the larger project in development by TU Dortmund and necessitating an exploration of the common C++ computer language and open-source software such as Flex and Bison.
Title of Work (IGP Language): Die Entwicklung eines SQL-Parsers für pädagogische Datenbankmodifikation
Abstract (IGP Language): Das primäre Ziel aller Universität sollte es sein, ihre Qualität der angebotenen Ausbildung zu verbessern. Um Studierenden der Informatik eine interaktivere Erfahrung mit Datenbankenmodifikation zu bieten, entwickelt eine Mannschaft des Fachbereichs Informatik von TU Dortmund ein kostenlose editierbare Datenbankprogramm, das lokal auf verschiedene Betriebssysteme gehostet werden kann. Eine wichtige Komponente des Programms ist einen spezialangefertigten SQL-Parser, der illegale Befehle, die das Versagen der Datenbank verursachen würde, von den UserInnen verhindern wird. Ein Einblick in die allgemeine Rolle eines Parsers in digitalen Anwendungen und die Konstruktion des spezifischen Parsers, der von dem größeren Projekt derzeit in der Entwicklung an TU Dortmund gefordert wird und eine Auseinandersetzung mit der gemeinsamen Computer-Sprache C++ und Open-Source-Software wie Flex und Bison benötigte, sind die Fokusse dieser Bachelorarbeit.
Click Here to View Elevator Pitch
Elevator Pitch Transcript: Hello. My name is Savannah Fischer, and I am majoring in Computer Science and German. In March, I returned from a six-month study abroad program in Dortmund, a city in the German federal state of North Rhein-Westphalia that rests an hour from the Netherlands border, because of the increasing uncertainty surrounding COVID-19 and global infection rates. From March until the end of June, I continued to remotely participate in an internship I had received while still in the country. This internship, held through the Technical University of Dortmund’s Information Technology Department, consisted of the construction of a rudimentary SQL-Parser using open-source software compatible with various common operating systems. Eventually, this program will be attached to a larger free database application that will be distributed to the technically-focused members of TU Dortmund’s student body, so that they may construct, deconstruct, and modify elements and structures in their database to give them a more hands-on manner of approaching related curricula within their computer science-oriented courses.
Constructing this parser required review of the implementation of a predicate logic tree, of organizing given commands based on priority and whether or not they would be applied in conjunction to other commands. A basic understanding of Flex lexers and the greater Bison grammar input files was also necessary to develop. Lexers are used to create symbols and keywords out of the otherwise meaningless characters submitted by the program’s users, and the grammar files organize those symbols, defining their legal uses when combined, just as human languages have rules about how their words must be used to form legal sentences. Together, these files construct a program capable of determining if valid SQL commands have been submitted, or if the application needs to halt reading to manage any errors from improper text submissions that fail to match any of the defined keywords or keyword combinations.
If parser construction and an exploration of the formation of the SQL database language are subjects that you’d like to learn more about, my full presentation of my experience with this project will be available soon. Join in. I’d love to hear what you think and to answer any questions you may have. See you soon! Bye.
Click Here to View Full Length Video
Full-Length Video Transcript: Hello, my name is Savannah Fischer, and my presentation is about the development of an SQL-Parser for educational database modification and my German experience.
There are four main parts to my presentation: general information about my fieldwork in Germany, the specific tasks and experiences related to my project and related to my pre-project preparations, and my experiences with German culture and speech development in regard to the topics of my fieldwork.
I began my German fieldwork experience in February with the Computer Science Department of the Technical University of Dortmund. A team of graduate students under the guidance of Dr. Professor Jens Teubner wanted to create an editable database in order to give computer science students a more interactive learning experience in regard to database courses. To prepare for the realization of this project, I was delegated the task of developing a custom Parser for the SQL database language. This parser would ensure that errors resulting from input can be easily handled and that the database project’s functionality regarding input recognition can be improved.
While I was working in Germany, I received new about COVID-19, specifically how it was spreading through France. There were soon reports that the virus was in Bavaria and then in the area surrounding Dortmund. My workplace closed in order to restrict the spread of the illness, and we were told that all of our work should be completed remotely. That same day, NAU sent off an evacuation notice. Luckily, and because all of the other members of my team would need to remain outside of the office, I was allowed to finish my fieldwork after I returned to the United States.
Now I will talk about my project.
SQL is a computer language that is considered the most commonly used Query Statement Language for relational databases, which are databases whose data is structured for easy access.
A Parser evaluates an entered character string and determines if it exists in the grammar of the desired language (in this case, SQL). Entered character strings that are not in the grammar throw error messages and are handled so that the errors do not cause problems in the rest of the related programs.
Flex, or the fast lexical analyzer generator, is a tool used for the creation of lexical analysis and for symbol production. Flex works with Bison, a tool for parser generation, in order to build a complete Parser. More details about Flex and Bison will be provided later in the presentation.
Before beginning construction of the file that will house the Parser, I had to reexplore the structure of a predicate tree. A predicate tree is a tree-like model for ordering steps in the combination of Boolean logic, such as greater than; less than; and equal to, and logical operators, such as the commands AND and OR. The order of the steps changes depending on the locations of the parentheses in the code.
The Where-clause in SQL was the focus of my project preparation, where I was tasked with making a program that would build a predicate tree.
Here are two examples of simple SQL-statements. The circled portions are the Where-clauses. Both of these clauses result in the very same predicate tree, so if parentheses only exist in the beginning of the Where-clause, it is as if there are no parentheses in the clause at all.
However, when parentheses are at the end of the Where-clause, the structure changes. “Y is less than two” and “Z is equal to 5” are now parts of the OR-Section, and the OR-Section is now part of the AND-Section, despite how it was previously independent.
A larger number of sections with parentheses in a Where-clause result in the increasing complexity of the predicate tree.
A goal of an SQL-Parser is to organize every SQL statement. In preparation for this, it is important to explain to the machine how a predicate tree, through code, is formed. A header file Is used in order to build a framework for the construction of a predicate tree. The parts of organized sections of an SQL statement are reduced to their most basic components, into “Nodes” and “Connections”. Various Functions are also defined in order to aid in the efficient combination of components and the eventual creation of a predicate tree.
Before a parser can be built, symbols that the parser can understand must be defined. A Lexer file serves as the Parser’s dictionary. Without the Lexer, character strings are meaningless. For example: the word “cellphone” has a meaning. “Cellphone” is an object, a real thing. And when there is no “cellphone” to be found, “cellphone” is still recognized as a word with meaning. However, the computer cannot make that connection. It only knows that the word “cellphone” is a combination of random characters.
Most of the defined symbols for my project are simple. By that, I mean that I had to tell the Computer that the specific combinations of characters or shapes, for example the letters F, R, O, M in that specific order, result in or return whole symbol-words. Then, when the Lexer sees F, R, O, and M without spaces or letters or shapes between the letters, the Lexer recognizes that these combined letters create a symbol.
The symbols in a Lexer are more complex when the machine has to remember an exact input. For example: databases have user-defined table names and attributes. In these instances, it is important to not only return a symbol to represent what kind of input it is (like a word, a number, or a combination of both), but also to create a variable that will hold the value and allow it to be retrieved and manipulated in the future.
With the Lexer’s defined symbols and the structure provided by the header file, the grammar file, the primary program file of the parser, can be constructed. The grammar file organizes Lexer-symbols in user-defined “phrases” where only singular symbols exist. In some cases, the phrase is empty, which means that there is a chance that no symbols are expected to be used by the phrase when the phrase is utilized.
Phrases can then be organized into “Sentences”. Various phrases and even other sentences can be part of a sentence, which means that the symbols of the phrases in the user-input are read. When a sentence calls itself, that means that the symbols in the sentence occur either multiple times or only once. A sentence continues to be read recursively until a phrase or series of phrases, that can each only be read as one symbol and not many reoccurring symbol groups, allows the sentence to end.
While the symbols are being read, the symbols are constantly being organized into predicate trees and then combined into a larger predicate tree. If the user input does not match a sentence patter, then the user-input is illegal. The parser throws an error, and a warning message is sent to the user.
The result of my fieldwork was a complete parser program that organized the various parts of many different SQL statements into a predicate tree of predicate trees. The parser program would also display error messages when appropriate.
Because of the evacuation during the beginning of the semester reserved for fieldwork, I have a unique perspective regarding Germany’s work culture. In my experience, the expectations of the German Workers is more prominent, but they are less pressured than workers in the USA. By that, I mean that it is less likely that supervisors and coworkers will follow other workers around and watch over their progress. They expect the new workers to have the skills necessary to complete their assigned tasks, or else have the ability to quickly gain the knowledge necessary to do so. Conversely, supervisors and coworkers are more likely to hover around new hires until they are certain that the new workers are familiar with the rules and common tasks of the trade.
Germany is also more socially connected and organized than the USA. While I worked with the Computer Science Department at TU Dortmund, I found it interesting that lunch was a group event. It was a stark difference from what I’ve experienced in previous internships that I’ve held in the US; all breaks were considered individual, often isolated experiences. There are also larger celebrations more often in Germany, and they come with the expectation that those involved should celebrate in public spaces. In the USA, it is often the case that individuals celebrate holidays in the comfort of their own homes, or else only in the company of family members and close friends. This more poignant togetherness of the German culture influenced the country-wide reaction to the COVID-19 pandemic. Non-essential businesses were quick to close, quarantines were efficiently imposed, and public locations became not as populated. Through this, Germany was rated as one of the best European countries in regard to their pandemic reaction planning.
Before I was involved with my fieldwork, I attended TU Dortmund as student, and the majority of my courses concentrated on intermediate German speaking skills and on vocabulary. The shift from learning about the structure of German as a language to learning about the structure of a computer language necessitated a personal analysis of the relationship between both.
Like computer languages, human languages have “symbols”, but we only call them “words: . A phrase is the fundamental meaning of synonyms, and a sentence is a sentence in the traditional meaning of the word. However, the differences between the two language types is a great interest. Computer languages are like human languages, but only when a human language is used as the ideal of itself, in that the “ideal” is considered the user of the smallest parts of a language that allows one to creating meaning. There is no place for slang, for contractions, or for the sudden creation of new words, which is popular in Germany. Computer languages are composed of rigid rules, and any miniscule deviation results in an error and the inability for the computer to understand what the user or speaker is trying to convey.
Conversely, human languages, like German, for example, are much more flexible. Every Generation places new words in conversational speech, and if there is no word to describe what they want, a native German citizen will combine various words until they have a word with the definition they desire. For example: if someone wanted a singular word for “Gesellschaften für den Schutz der Rechte der Versicherung“ (businesses for the protection of the legal rights of insurance), which is how it might be described in computer languages, there exists the single word “Rechtsschutzversicherungsgesellschaften” in Germany, wich means the same thing.
Although computer and human languages are so different, it is important to understand that the simplest parts of a language, of which computer languages are composed, form the foundations of the more complex human languages. If one better understands the foundations of language, they can more easily manipulate the more complex aspects of all languages. This is the difference between knowing German words and the ability to hold a meaningful conversation with a fluent German speaker.
Many thanks to Frau Jessica Wood, Herr Dr. Prof. Jens Teubner, Jan Mühlig, Max Berens, Lukas Schreiber, Annika Baackmann, und my family.
Thank you for coming to my presentation. I hope you enjoyed it. Goodbye.
Subject Major Session via ZOOM: Zoom Link Subject Major Password: SUMMIT
Language Session via ZOOM: Zoom Link Language Session Password: SUMMIT
Keywords
Parsers, SQL, Databases, Computer Languages, Technology, Computer Science, German, Programming, Coding
Savannah Fischer
Language
German
Description
Country of Study
Germany
Computer Science