Mar 10, 2010

Modeling with SQL Server Modeling

As SQL Server Modeling (the artist formerly known as "Oslo") is nearing it's release date, I wanted to try out some custom data modeling with it. Note that I'm only using the "M" part of the equation here, as SQL Server Modeling consists of three parts:
  1. “M” is a highly productive, developer friendly, textual language for defining schemas, queries, values, functions and DSLs for SQL Server databases
  2. “Quadrant” is a customizable tool for interacting with large datasets stored in SQL Server databases
  3. SQL Server Modeling Services (formerly the “Oslo” Repository) is a SQL Server role for the secure sharing of models between applications and systems
*) Ad-text from Microsoft

The purpose of this exercise is to model a simple schema consisting of a model(s) that have entities that have attributes with a type. A sample dataset would look like the following:

ModelName {
  Entity1 {
    Attribute1 Text(255),
    Attribute2 Number
  },
  Entity2 {
    Attribute 3 Number
  }

},
AnotherModel {
  ...
}

Note. The following requires Visual Studio 2010 Beta 2; unfortunately, VS2010 RC won't work with this and there probably won't be a r$efresh as Visual Studio RTM's release date is almost here.

The second thing you'll need is the actual SQL Server Modeling installation, that you can find here.

Caveat emptor: As there isn't that much information available on "M", I pretty much did this with trial and error. Most likely parts of it could be done easier then I did here, but I hope I'm saving you a few hours figuring them out.

-==- Tools -==-

Now that you have installed your SSM (you have, right?), let's open our Intellipads.



I prefer to open the schema to the middle, data input to my left and result to my right, but you can configure it how ever you like. Create two files *.mg and *.txt. Now split the txt-file with a vertical slice.
After that, select DSL Grammar mode for the mg-file and mg-filename mode for both of the txt file slices. Then, select the right most slice (by clicking it) and from DSL menu, select "Switch to output buffer". Now when you change your syntax, your data file should automatically show errors in it's syntax and when you change your data file, you immediately see the output of the data based on the grammar.
Yes, it's a bit of a complex procedure but it shouldn't be that hard. The Only Easy Day Was Yesterday, so let's move on to the harder part of the show.

-==- Syntax parsing -==-

Okey, let's start with basics.

module MDSModel {
    language Model {
        syntax Main = ...

        interleave ws = " " | "\r" | "\n";
    }
}


Module and language are not terribly important in this case, so you can ignore them for now. More important parts are "syntax Main" and "interleave". "syntax Main" is used as a starting point of our structure. "Interleave" is a list of characters or types we wan't to completely ignore. Here we are saying that in our language whitespace is not significant, or at least that space and line ending aren't.

There are a few constructs the we need to be aware of. "token" keyword is used for identifying fixed parts of the text. For example:

token number = "0".."9";
token char = ("a".."z"|"A".."Z");

Note that we used the ".." to denote that number starts with text token "0" and ends with "9" and includes everything between them. Similarly, we are saying that with char, but we are also using the pipe ( | ) character to "OR" the clauses together. So char can be anything from a to z and from A to Z.

"syntax" is used to group together various tokens or other (or same in recursion) syntax constructs. For example:

syntax Model =
    m:ModelName "{" e:Entities "}"
    => Model { m, e };

Here we are saying that Model construct contains a ModelName token that has Entities constructs inside braces. We haven't defined them yet, but I hope the idea is clear enough. Note that we are also using shorthand markers for both ModelName and Entities so that we can more easily use it when actually constructing the result. The result of the syntax is described after "=>". Braces in result means that we are constructing a new object; and here specifically, we are saying that construct a object named "Model" that contains an ModelName and Entities.

The third type is .. well, type. Types can be used to define complex data structures. I'm not well versed enough to comment on type usage, so I'll suggest you check out MSDN documents.
If you're going to upload your model to SSMS, these will come handy (and are required), but for now we'll skip these as we are not going to need them.

Let's start with the easier one of these and enter what ModelName should look like.

token AlphaNumeric = char|number;
token ModelName = a:AlphaNumeric+ => Name { a };

AlphaNumeric denotes that is either a char or a number token that we defined above. After that, we can define ModelName as a array of AlphaNumeric tokens. Here we are requiring that at least one AlphaNumeric token exists. When creating a 0..n array instead of 1..n array, use a star (*) instead of plus (+).

The result of ModelName is a object named "Name" that contains the contents of the AlphaNumeric array.

For example, the token:
Hello

..would be converted to a construct:
Name { Hello }

Let's simplify a bit our earlier Model syntax so we can see actual results in Intellipad and modify it to:

syntax Model =
    m:ModelName "{"  "}"
    => Model { m };

Now type to the input slice in your Intellipad:
HereIsTheName { }

And you should see the following on your right slice:



But this way, we can't add multiple models, so let's add some recursion:

syntax Models =
            m:Model => Models [m]
            |
            ms:Models "," m:Model
            => Models [ valuesof(ms), m ]
            ;

So if we have one model like we have in above data, we get an array (which is denoted by brackets) of objects named Models that contain one model. With data:
Name1 { },
Name2 { }

..we would use the "OR" case of Models syntax as it matches the Models "," Model syntax. Therefore, we get an array of objects named "Models" that contain all the Model objects. A very important thing to note here is that we are using valuesof-function. Valuesof-function will flatten the structure of Models, so we get [ Model, Model ] array instead of [ Models, Model ] array, that would be inconsistent with the rest of our structure.



Here's how it would look like without the use of valuesof:



Now that we've established the basic syntax, let's continue refining our model.

Let's change the Model syntax to what it was:


syntax Model =
            m:ModelName "{" e:Entities "}"
            => Model { m, e }
            ;

And add the Entities (collection) and Entity syntaxes.



token EntityName = a:AlphaNumeric+ => Name { a };



syntax Entity =
            e:EntityName "{" "}"
            => Entity { e }
            ;

syntax Entities =
            e:Entity => Entities [ e ]
            |
            es:Entities "," e:Entity
            => Entities [ valuesof(es), e ];

As you can see, we're following the same pattern as before.

After these changes, your data slice should show an error as we have not entered any Entities to a model, which is required. Pressing ctrl+shift+e you'll get a detailed report of whats missing.


Changing the data to what's required, we now have a Model/Entity hierarchy.


Adding the attribute syntax should by now be a piece of cake, so I'll skip it (otherwise this post will go on forever) but it'll be on the source code.

One interesting thing I would like to point out, is that you can short hand name any named "object" as we've seen before, but you can also short hand multiple pieces of syntax by wrapping it inside parenthesis like this:

shorthand:(Entities "," Entity) => { shorthand };



-==- What's next -==-

Now that we have this syntax, what can we do with it?

Well, as I understand it, you can upload the model to SQL Server Modeling Services and input data and query it using Quadrant. This though requires (if I understood it correctly) that you use "type" constructs, so that SSMS knows how to create correct data types for the model. (More on that later or you can check the PDC 2009 video links.)

For another, you can programmatically use the mg-file to parse users input files. I'll elaborate more on this on my next post. If you don't want to wait, you can check out Jeff Piknston's post on dynamically using m grammar parsing results. You should though watch the PDC video before using it.

-==- Source code -==-

.. can be found here with a simple dataset.

-==- Additional resources -==-

You can find tutorials on MSDN.

You should also check out the PDC 2009 videos:

Microsoft Project Code Name “M”: The Data and Modeling Language (Don Box, Jeff Pinkston)

The ‘M’-Based System.Identity Model for Accessing Directory Services
SQL Server Modeling Services: Using Metadata to Drive Application Design, Development and Management
Building Data-Driven Applications Using Microsoft Project Code Name "Quadrant" and Microsoft Project Code Name "M"

.. and the blogs ..

http://blogs.msdn.com/mlanguage/ -- You can see the list of whole "M" team's blogs on the left.


0 comments:

Post a Comment