Discussions on Music modeling

Use Cases

Ok I'm going to skip how easy or hard these would be to model and just throw out some interesting use cases;

I like [name of famous artist] , what has he ever been involved in? What did he do?
I like [name of famous artist], give me lists of other artists [mastered, mixed, produced, sampled,toured with],etc..
I like [this track] , what are other tracks that sound like this? <-- always near impossible, always interesting
[Artist] has recorded with how many other people?
[Artist] has played where? when? who else were they on tour with?
[this track] was sampled in [these songs]
[Artist] used [instrument] on [this track], [Artist] used [instrument] through [instrument modifier] on [this track]
Albumless tracks- Brian Eno created the windows startup sound.
[Track] is in what key? scale? studio recorded at?

just a couple off the top of my head, some obviously more valuable than others.
anyone else?



Albums vs. Releases

When we first imported music data, we made some data modeling decisions based on not wanting to over-stress the database. These considerations are no longer relevant.

One of those decisions was that we don’t have a clean division between albums and releases. We identify a group of releases as all being of the same album; we then pick one of those releases and designate it as the album. The other releases are then releases of that album, but the primary release is implicit, bound up with the album itself.

I’d like to change this, making the album explicit and separate from the release. The way this would happen is that every album would be duplicated; the new entity would be a release, and would be marked as the first release of the album. The MusicBrainz identification code—which is really tied to the release, not the album—would be migrated to the newly-created release.

Any comments?

Chris -- do you have a strawman schema for this to look at?

This would use the existing album and release schemata; this is a data modeling change only in how they are used.

This sounds right to me, and it's consistent with the way jeff is thinking about the book data. I expect it applies to other kinds of data as well.

This is implemented on sandbox; go poke around any album there, and see how it would look.

I think this works. I think there are some redundant fields between album and release: "release date", "running time", and "label". On the album type, I think "initial release date" is a useful property, otherwise I think these belong on release.

More thoughts. I've put in some of the 7" single releases for Billy Bragg's "Greetings to the New Brunette" on sandbox (this link will self-destruct tonight!). It was released in three countries, with occasionally different B sides, which is apprehended well by this model. However, it leaves me with a question about what to put in the "compositions" property on "album" -- just the A-side? the A-side and all B-sides? the A-side and B-side of the first release?

The other issue I saw is that that producer is credited differently in different countries -- John Porter and Kenny Jones in the UK and W. Germany, just John Porter in Spain. Do we have any sense of how common this is, or is this just a fluke? (Chances are that the producers are the same and it was some contractual thing/label screw-up that caused Jones to be dropped on one release, but that's just speculation.)

Jeff, please see the new thread on single modeling.

Now, as for the other questions Jeff raises: the “Compositions” property of Musical Album are really intended to address compositions that span tracks and are complete, such as a symphony recorded as an album. In this case, I wouldn’t bother with the compositions on the album, just on the tracks themselves. As for different producers; I would try to find out who actually produced the album and credit it correctly. My guess here would be that the UK/German release included one track produced by Porter and one by Jones, while the Spanish one included two tracks produced by Porter, or that the A side was by Porter and the Spanish label ignored the B side. But in this case, I would go for the more inclusive credit at the album level. We may need a producer property on Musical Release, though, as this might well come up with bonus tracks on a re-release or a remastered release.

Working backwards… Jeff, I’m fine with changing the “Release Date” property on Musical Album to “Initial Release Date.” I wonder about dropping “Running Time,” though; I find it interesting whether it was a 37-minute album or a 73-minute album. But I could be convinced either way. Anyone else have an opinion?

I think it's a useful property, I just think it makes more sense to be on release, since different releases will have different running times (due to bonus tracks, alternate tracks in different countries, etc.).

"I think it's a useful property" = "running time".

Re producers: one possibility would be to model producer at the track level, which would eliminate the ambiguity (but which is also much harder information to come by).

I think you definitely want "running time" on the release. As to whether it should be on the album as well, I'm a little torn. I agree with Chris that it's potentially useful to have it attached to the album, but given that the model is moving in the direction of more abstraction at the album level, it may not make sense to have it there.

I think this is one of those cases where inherited properties ("transitive properties"? what were we calling them?) would be handy, e.g. "the running time of a release is the running time of the album unless a value exists for the release"

I really do wonder about this album modeling issue. I understand why you went with it to begin with, but I think there is a simpler way to model this.

If you consider each album as a collection of track recordings, you can bypass all the complexities of releases and different versions. When you consider the use case, most people want to associate ties between listening experiences. That experience is dictated by the recording, not the metadata associated with the album.

singles can be tied to relevant tracks
7inch singles are versions of 7 tracks
12 inch singles are versions of the same 7 tracks
Promo releases of an album, that were released, then recalled, then re-released under a different mastering engineer, are still the same 7 tracks.

This would allow one to infer a "meta-album" between multiple releases. If you have 6 different releases, but the tracks are exactly the same, you can infer that this collection is one consistent "album" regardless of how many forms it has. To work backwards from the album adds a lot of complexity.

If a promo release has different tracks than the "official" release, why are they grouped together? In a purchase decision, these are separate products, even if they have similar components.

To solve the cluttered discography issue, you can set a view for simliarity. If more than 70% of the tracks are simliar, view as one meta-release.

I only suggest this because why would someone want to find their way back to the album through the tracks, if the tracks do not contain any of the "fun" information?

I'm a bit confused. Isn't what you're proposing pretty much the same as what Chris was proposing? That there is a more abstract "album" that would contain the human-friendly meta-information and that there would be aggregations of releases that were very similar?

Adam, this is, as Robert suggests, where we’re headed. The missing use case is the one where I slap a piece of plastic into my computer’s drive and want to look it up, or I have an audio file that I ripped a long time ago or which I bought on-line, and which I want to look up. (I am not proposing that Freebase should be the lookup service, but having found an identifier from MusicBrainz, how much information can I then glean from Freebase?) In that case, I need a database of separate, distinct tracks and releases. Now, we could aggregate all the relevant MusicBrainz identifiers onto the relevant albums, but I would like to know whether this is the shorter American mix or the longer UK mix, or whether it’s the lousy initial CD release or the gold-remastered high-quality later release.

As far as track identity, which you touch on, the idea is that truly identical tracks—such as if the single version is the exact same mix as the album—then the track should really be a single entity in Freebase. But if it’s been remastered (dance mix, short version, etc.) then it should be a different track. It should, however, be the same composition; I look forward to the point where we can get that rich information.

I guess what I'm getting at is why must we associate the "meta-album" with a physical album? I would think that all meta-information should be tied to releases of cd's but have the "album" be just a node. I say this because then DVD's about the album could also be tied to this node, as well as the artist- without having to incorrectly state that a particular actual release is the official manifestation of the recorded experience.

For looking up information about a track, if you have a live album, and there is DVD of that recording in video form, shouldn't we be able to find that connection through Freebase? Wouldn't that connection be easier if "album" was a collection of manifestations, instead of having an official manifestation with many releases related to it? What is gained by associating a recorded experience with an official version of it, especially if that experience may not live completely in the auditory domain?

Adam -- Chris is picking a release that might be considered as the "definitive" release and denormalizing the tracks and other data to the properties on the Album. This is done as a convenience to someone writing an application who's not interested in sifting through the various releases to find a track listing when building a simple application.

To clarify: The new Album type is a collection of manifestations, where one manifestation is denormalized as a convenience. Even though this information suggests that it's a particular release, the Album should still be used in an abstract way. If you want to attach the Album to a DVD, you can.

I'm beginning to wonder, however, if instead of tracks, an Album should point to "recorded works", which are abstract in the same way.

I would definitely prefer that same level of abstraction on the track level.

The only reason why I hesitate at the idea of a definitive release, as its much more likely that "releases" will be added to a group of tracks than another track will be added to a release. Wouldn't it make more sense to populate albums based on the tracks, or "recorded works" and not the other way around? A definitive "recording body" ? Releases tend not to share much meta-data with the first album other than information that really could come from the track meta-data anyway. They often have different mastering engineers, release dates, formats, and sometimes even labels.

For one manifestation to be denormalized as a convenience, it should conveniently have meta-data that we are most likely going to use. Since its the track meta-data that tends to re-appear and not the details of any one initial release, it seems to me it's worth bringing track information up for convenience instead of repeating album meta-data.

I am not sure I am completely following all the abstract jargon here, being elbows-deep in concrete instances. (-: This process is running on the production system right now, and will complete tonight. What it means is that (effectively) every album will have at least one release. Each album will have a set of tracks; those tracks will be the same as the track listing for the first known release.

As for abstract tracks, this is what the notion of Song and Composition are intended to address. MusicBrainz’ next-generation schema will have a notion of mix, recording, arrangement, and composition, for users that want to get to that level of detail, but I think that (for now, at least) that is beyond the scope of Freebase. I could be persuaded otherwise, but part of that persuasion would have to be a large cadre of users prepared to enter that data manually, as it’s not readily available anywhere that I’m aware of.

This operation is done. Every album for which we have any release information from MusicBrainz should now have at least one explicit release.

Singles as albums and releases

Jeff raises a good point about single releases. This stymied my attempt at comprehensive Jethro Tull information, for example. I was modeling a particular song, released as an A side of a single, as one album, with possible multiple releases with different tracks (for the different B sides). However, a particular pair of songs may show up as the B and A sides in another country!

So does anyone have a better idea of how to model singles? At one point, we were going to model these differently from albums, but decided to merge them together. The drawback of that is that if we list each permutation of singles as different first-class albums, the discography gets even more cluttered. Another idea would be to model singles as releases connected to the albums with which they’re associated, but I think that is confusing.

Suggestions? Make singles distinct from albums? What about EPs, then? If we keep singles as a kind of album, how different does it have to be to be a different album?

Some singles are not associated with an album at all, so I don't think they should be exclusively tied to the album. My other concerns were about composition (which you addressed -- I think some property hints would help, too) and producers, which we can continue to talk about, but which I don't think affects the single-release-as-album model.

The more I think about it, the more I think it makes sense to model singles and albums (and EPs) the same way. We should think a bit more about singles modelling, though. Are 7" singles and 12" singles different releases of the "album"? (Do people do this anymore? I realize that I have no idea how singles are released in the modern era.) Or are they different because, while they often represent the same song, they frequently feature entirely different versions of it?

I'm not sure it makes sense to think of singles as releases of an album - I think there will be enough cases where that's not true that baking it into the model might end up being clumsy. You might get more flexibility if singles are thought of as their own "albums" with their own releases. It may seem like overkill some of the time, but I think it still works. The B-side in many cases doesn't appear on the same album that the A-side was taken from and there is, as Jeff points out, the matter of different versions of album tracks appearing on the single.

In cases where the single really is a kind of promo for the album, you can still use the track names to find your way back to the album, so the connection is there, even if it's somewhat indirect.

I really do wonder about this album modeling issue. I understand why you went with it to begin with, but I think there is a simpler way to model this.

If you consider each album as a collection of track recordings, you can bypass all the complexities of releases and different versions. When you consider the use case, most people want to associate ties between listening experiences. That experience is dictated by the recording, not the metadata associated with the album.

singles can be tied to relevant tracks
7inch singles are versions of 7 tracks
12 inch singles are versions of the same 7 tracks
Promo releases of an album, that were released, then recalled, then re-released under a different mastering engineer, are still the same 7 tracks.

This would allow one to infer a "meta-album" between multiple releases. If you have 6 different releases, but the tracks are exactly the same, you can infer that this collection is one consistent "album" regardless of how many forms it has. To work backwards from the album adds a lot of complexity.

If a promo release has different tracks than the "official" release, why are they grouped together? In a purchase decision, these are separate products, even if they have similar components.

To solve the cluttered discography issue, you can set a view for simliarity. If more than 70% of the tracks are simliar, view as one meta-release.

I only suggest this because why would someone want to find their way back to the album through the tracks, if the tracks do not contain any of the "fun" information?





Multipart releases

Please check out the Music domain on Sandbox, particularly the new Multi-Part Musical Release and Musical Release Component types. The idea is that a release of e.g. The Wall would be modeled as a single thing, a Multi-Part Musical Release, whose components would be The Wall (disc 1) and The Wall (disc 2). Right now, each disc is a release of a distinct album, and there is no good way of discussing the Columbia Records 2-CD release of The Wall.

This also deals with the merge problems; right now, the Wikipedia article about The Wall is merged with disc 2, with no connection to disc 1 at all. This is just wrong.

I think this works. Because a "multi-part release" is also a release, there is the possibility that data will be entered redundantly among the multi-part release topic and it's sub-releases, but I a) don't see any way around it and b) don't think it's a very big deal (or terribly bad thing); the only real point of confusion that I can see is that when somebody queries for the releases that a track appears on, they might get both a multi-part release and one of its sub-releases returned, which could be confusing. Documentation will help with this, of course, and mjt apps could presumably be made smart enough to deal with this elegantly.

My preference—which I will clarify in the documentation after sandbox is renewed—would be that tracks are connected to the component, but not the package. So the track “Hey You” would be on the album The Wall, and on the release The Wall (disc 2), which is a component of the release The Wall. It is also worth noting that tracks don’t reciprocate the /music/release/track property; the album is considered of primary interest from the track’s point of view. Like Jeff, I don’t necessarily see it as a problem if someone is excessively complete about what tracks are on a release, but given the way the UI works, they won’t easily be prompted to add the tracks incorrectly, I think.

Music Ontology

Have you guys seen The Music Ontology in RDF/OWL, www.purl.org/ontology/mo/ ? (Actually if you're not into reading raw RDF then you will probably prefer reading http://musicontology.com/)

There may be some ideas you can share... the same people are also working on a TV/Radio Programmes ontology for us at the BBC, which we're going to release publicly Real Soon Now...

Ah—these are the guys that Robert Kaye from MusicBrainz was talking to when we were meeting in London. Our goal is to be compatible with them, and with the next-generation MusicBrainz schema, but not necessarily have a complete one-to-one correspondence. We would like to be a little more flexible than a rigid ontology, for instance; note that they have taken MusicBrainz’ release type list verbatim, while Freebase allows users to enrich the categorization of release types. (For example, MusicBrainz and Music Ontology don’t allow an album to be a live spoken-word recording, or a remixed EP.)

MusicBrainz data updated

Artist, album, and release information has been updated from MusicBrainz. It’s not detailed yet, but the objects should all be there, labeled and connected correctly. See any problems? Let me know, please!

Pseudonyms going away

As posted on the data modeling mailing list, the Pseudonym type is going away. A new type, Creative Work, is being used to capture the literal credit on Musical Releases, but the artist relationship will connect to the actual artist, regardless of name.

This happened. One interesting side-effect is that many artists are currently named by their legal names instead of their better-known performance names. If you run across, them, please do change them. We will be running a gardening sweep later to attempt to determine their best-known names (based on the Creative Work credits), but some human input certainly wouldn’t hurt.

Composition: Uses Melody From and Uses Chord Changes From

It’s been proposed, and experimented with briefly, to add two new pairs of properties to Composition: “Uses Melody From”/“Melody Used By” and “Uses Chord Changes From”/“Chord Changes Used By.” Trivial examples are “The Star-Spangled Banner” uses the melody from “The Anacreontick Song,” and half of the jazz tunes in the world use chord changes from “I Got Rhythm.” Thoughts on either the structure or the property names?

Changing the name of the Musical Artist "Songs" property

The "Songs" property on Musical Artist is confusing as it points to tracks (not songs) and it doesn't include all tracks an artist has made.  In most cases, this property only includes tracks on compilations.  At some point, it may contain all tracks, but for now this is confusing.

 I suggest we rename this to "Other tracks".

This is done; “Tracks Recorded.” I don’t like “Other tracks” since it can be used just fine for tracks on albums by that artist. However, the key has remained unchanged lo these many years, so this is negotiable.

Track contributions

Right now, one can model contributions by an artist who isn’t the primary recording artist on the album level, but there is no way to capture equivalent information at the track level. I have modeled a proposal on sandbox; please check it out and comment in this thread.

I like it.

Very nice.  You may want to remove /common/topic as an included type.

Release label, catalogue ID, and format

Right now, a release is linked with its label (or labels). However, the catalogue number and format is not given.

If we add this information, it will mean a new compound value type, since the catalogue number has to be associated with the label and format. See MusicBrainz’s record for Sgt. Pepper’s Lonely Hearts Club Band for an example of how this would be modeled.

Is this too complex? Should a so-called “release” with multiple formats be forced to split into multiple releases? If so, what happens to the MusicBrainz-correlating IDs, since MusicBrainz is not so careful?

The current model doesn't currently support the MusicBrainz's model, either, since we don't correllate release date to label. I had assumed that each label/date/format combination constituted a different "musical release" already. Obviously, a CVT would simplify this somewhat, if we wanted to assert that a "musical release" only implied that the tracks were the same, regardless of when/where/how/by whom it was released.

The one possible issus I see with this would be the "credited as" property, which would most likely differ on foreign releases.  But I don't have any good ideas about dealing with the MB IDs.

I have modeled a release event on sandbox. I’m not keen on the name—I picked it for the release rather than the label/catalog association—but please let me know if you like the model and/or if you have a better suggestion for the name.

And should the release format (78, 45, LP, EP, 8-track, cassette, cassingle, CD, CD-single) be on this release event CVT? That mimics the MusicBrainz model, but if that’s not what we want to do, we could put the format on the release itself and force multi-format releases to split.

To get this straight:

If we put the format on "release event", a "musical album" would only have multiple "musical releases" if the releases had different tracks (either because they were remastered or because releases have different track listings).

If we put the release format on "musical release" instead, then the US, UK, and Japanese CD releases of an album would all be one "musical release", and the simultaneous US, UK, and Japanese LP releases (from the same respective labels) would be different "musical releases".

Is that right?  If so, I think the first one is probably better -- the second one seems confusing to me.

I believe your analysis is right, Jeff, and I agree that it is easier to keep the format, the physical medium, on the release event. It also harmonizes better with the MusicBrainz data. Now I just need to see if it makes anyone else cranky. (-:

Date and place of recording tracks

Fans of particular artists or genres tend to care about minutiæ of recordings, including the date and place of a recording.

One way to model this in Freebase would be to have two new properties on Musical Track: “Place,” expecting Location, and “Date,” with a Date/Time value.

Another way to model it would be to attempt to reflect the underlying recording which is mastered into a track. Multiple takes on different days may be reflected in a final track, for instance. This also makes modeling sampling possible.

Thoughts? 

Getting finer grained than track seems like overkill. I'd be happy with Place and Date on Track. How about on Album too? Most albums are recorded in the same place, and most live albums are recorded at the same time.

I think date and location are good properties for tracks in general; I don't have a strong opinion about the depth of the model, though.

This sounds like fun data to me. Good cocktail party discussion. How about using "Place of recording" because this might differ from where it was engineered or mastered. The location property could be added as a CVT with date and permitted to hold multiple entries for instances when multiple takes are included on a single track (an option that probably won't be used that much).

I have modeled these on sandbox. I hedged; the two properties do not use a CVT and have singular names, but they are non-unique. If there is significant user confusion or demand for more complex models, we can always promote the existing values to use a CVT.

This works for me.

How abstract is "track" now?  That is, how often will there be tracks in freebase that are copies of a specific recording session?

A Musical Track should be a bit of recorded sound expicitly instantiated somewhere, whether it’s in a file, on a disc, or on a tape. Some tracks will come from single recording sessions, others will be mixed from multiple sessions, and sample-based tracks might not ever have been really “recorded” at all, at least not by the credited artist.

I'm sorry.  I asked my question badly.  If you are attaching recording-session specific data to a track, would the track be the "definitive" version of that recording, or would there be other instances of tracks that represent that same recording?  In other words, would people be able to find the session data if there are many representations of the recording?

I understand now, Robert. That is kind of the crux of the question; we could be more pedantic and expicitly model masters and/or recording sessions which underlie tracks. That would be a more robust and thorough model, but it would also complicate actual use. Al’s opinion is that the track level is sufficient for now, and as he is more likely to use this than I am, I am inclined to go with that, as we can always garden out an implicit session from a track later on.

definitely interesting data. i endorse the simple model. putting the onus on the developer to deduce some obvious questions seems reasonable:

Did trackA originate from the same session as trackB (by the same artist)? If the location is the same and the date is close (say to the month) then True

Seems easy enough. 

 

Classical works

I have a lot of classical music data (composers and their works, with opus numbers and catlog numbers etc.) which I'd be happy to load up.

However, I note that there are some issues around the schema in this area; e.g. there separate entities for composition and opera. (Surely, this can't be correct - an opera is a type of composition), there is no provision for opus numbers, catolog numbers etc.

Is anyone working in this area? Anyone have any recommendations for a good, practical way forward?

 

 

This is an interesting start, simonhill.

The Opera domain was done as its own project; the Opera type should probably be refactored to include Composition (and similarly Opera Composer should include Composer.

Some technical notes on your types:

  • The catalog abbreviation and opus number should be text (or even machine-readable strings), not Topics; making D (the topic about the fourth letter of the English alphabet) the abbreviation for Schubert’s catalog is a bit strange, likewise for the number 1.
  • The expected types for other properties could also be adjusted; the “composer” property should expect Composer, “musicologist” should expect Person, or ideally, a new Musicologist type.
  • I would also have your composition include the Composition from the Music domain to make it easier to see what properties are already present and which need to be added. Is a discrete “title” really needed, in addition to the name of the topic?

Thanks for getting this started.