Albums vs. Releases

  1.  
    1. When we first imported music data, we made some data modeling decisions based on not wanting to over-stress the database. These considerations are no longer relevant.

      One of those decisions was that we don’t have a clean division between albums and releases. We identify a group of releases as all being of the same album; we then pick one of those releases and designate it as the album. The other releases are then releases of that album, but the primary release is implicit, bound up with the album itself.

      I’d like to change this, making the album explicit and separate from the release. The way this would happen is that every album would be duplicated; the new entity would be a release, and would be marked as the first release of the album. The MusicBrainz identification code—which is really tied to the release, not the album—would be migrated to the newly-created release.

      Any comments?

      1. Chris -- do you have a strawman schema for this to look at?

      2. This would use the existing album and release schemata; this is a data modeling change only in how they are used.

      3. This sounds right to me, and it's consistent with the way jeff is thinking about the book data. I expect it applies to other kinds of data as well.

      4. This is implemented on sandbox; go poke around any album there, and see how it would look.

      5. I think this works. I think there are some redundant fields between album and release: "release date", "running time", and "label". On the album type, I think "initial release date" is a useful property, otherwise I think these belong on release.

      6. More thoughts. I've put in some of the 7" single releases for Billy Bragg's "Greetings to the New Brunette" on sandbox (this link will self-destruct tonight!). It was released in three countries, with occasionally different B sides, which is apprehended well by this model. However, it leaves me with a question about what to put in the "compositions" property on "album" -- just the A-side? the A-side and all B-sides? the A-side and B-side of the first release?

        The other issue I saw is that that producer is credited differently in different countries -- John Porter and Kenny Jones in the UK and W. Germany, just John Porter in Spain. Do we have any sense of how common this is, or is this just a fluke? (Chances are that the producers are the same and it was some contractual thing/label screw-up that caused Jones to be dropped on one release, but that's just speculation.)

      7. Jeff, please see the new thread on single modeling.

      8. Now, as for the other questions Jeff raises: the “Compositions” property of Musical Album are really intended to address compositions that span tracks and are complete, such as a symphony recorded as an album. In this case, I wouldn’t bother with the compositions on the album, just on the tracks themselves. As for different producers; I would try to find out who actually produced the album and credit it correctly. My guess here would be that the UK/German release included one track produced by Porter and one by Jones, while the Spanish one included two tracks produced by Porter, or that the A side was by Porter and the Spanish label ignored the B side. But in this case, I would go for the more inclusive credit at the album level. We may need a producer property on Musical Release, though, as this might well come up with bonus tracks on a re-release or a remastered release.

      9. Working backwards… Jeff, I’m fine with changing the “Release Date” property on Musical Album to “Initial Release Date.” I wonder about dropping “Running Time,” though; I find it interesting whether it was a 37-minute album or a 73-minute album. But I could be convinced either way. Anyone else have an opinion?

      10. I think it's a useful property, I just think it makes more sense to be on release, since different releases will have different running times (due to bonus tracks, alternate tracks in different countries, etc.).

      11. "I think it's a useful property" = "running time".

      12. Re producers: one possibility would be to model producer at the track level, which would eliminate the ambiguity (but which is also much harder information to come by).

      13. I think you definitely want "running time" on the release. As to whether it should be on the album as well, I'm a little torn. I agree with Chris that it's potentially useful to have it attached to the album, but given that the model is moving in the direction of more abstraction at the album level, it may not make sense to have it there.

        I think this is one of those cases where inherited properties ("transitive properties"? what were we calling them?) would be handy, e.g. "the running time of a release is the running time of the album unless a value exists for the release"

      14. I really do wonder about this album modeling issue. I understand why you went with it to begin with, but I think there is a simpler way to model this.

        If you consider each album as a collection of track recordings, you can bypass all the complexities of releases and different versions. When you consider the use case, most people want to associate ties between listening experiences. That experience is dictated by the recording, not the metadata associated with the album.

        singles can be tied to relevant tracks
        7inch singles are versions of 7 tracks
        12 inch singles are versions of the same 7 tracks
        Promo releases of an album, that were released, then recalled, then re-released under a different mastering engineer, are still the same 7 tracks.

        This would allow one to infer a "meta-album" between multiple releases. If you have 6 different releases, but the tracks are exactly the same, you can infer that this collection is one consistent "album" regardless of how many forms it has. To work backwards from the album adds a lot of complexity.

        If a promo release has different tracks than the "official" release, why are they grouped together? In a purchase decision, these are separate products, even if they have similar components.

        To solve the cluttered discography issue, you can set a view for simliarity. If more than 70% of the tracks are simliar, view as one meta-release.

        I only suggest this because why would someone want to find their way back to the album through the tracks, if the tracks do not contain any of the "fun" information?

      15. I'm a bit confused. Isn't what you're proposing pretty much the same as what Chris was proposing? That there is a more abstract "album" that would contain the human-friendly meta-information and that there would be aggregations of releases that were very similar?

      16. Adam, this is, as Robert suggests, where we’re headed. The missing use case is the one where I slap a piece of plastic into my computer’s drive and want to look it up, or I have an audio file that I ripped a long time ago or which I bought on-line, and which I want to look up. (I am not proposing that Freebase should be the lookup service, but having found an identifier from MusicBrainz, how much information can I then glean from Freebase?) In that case, I need a database of separate, distinct tracks and releases. Now, we could aggregate all the relevant MusicBrainz identifiers onto the relevant albums, but I would like to know whether this is the shorter American mix or the longer UK mix, or whether it’s the lousy initial CD release or the gold-remastered high-quality later release.

        As far as track identity, which you touch on, the idea is that truly identical tracks—such as if the single version is the exact same mix as the album—then the track should really be a single entity in Freebase. But if it’s been remastered (dance mix, short version, etc.) then it should be a different track. It should, however, be the same composition; I look forward to the point where we can get that rich information.

      17. I guess what I'm getting at is why must we associate the "meta-album" with a physical album? I would think that all meta-information should be tied to releases of cd's but have the "album" be just a node. I say this because then DVD's about the album could also be tied to this node, as well as the artist- without having to incorrectly state that a particular actual release is the official manifestation of the recorded experience.

        For looking up information about a track, if you have a live album, and there is DVD of that recording in video form, shouldn't we be able to find that connection through Freebase? Wouldn't that connection be easier if "album" was a collection of manifestations, instead of having an official manifestation with many releases related to it? What is gained by associating a recorded experience with an official version of it, especially if that experience may not live completely in the auditory domain?

      18. Adam -- Chris is picking a release that might be considered as the "definitive" release and denormalizing the tracks and other data to the properties on the Album. This is done as a convenience to someone writing an application who's not interested in sifting through the various releases to find a track listing when building a simple application.

        To clarify: The new Album type is a collection of manifestations, where one manifestation is denormalized as a convenience. Even though this information suggests that it's a particular release, the Album should still be used in an abstract way. If you want to attach the Album to a DVD, you can.

        I'm beginning to wonder, however, if instead of tracks, an Album should point to "recorded works", which are abstract in the same way.

      19. I would definitely prefer that same level of abstraction on the track level.

        The only reason why I hesitate at the idea of a definitive release, as its much more likely that "releases" will be added to a group of tracks than another track will be added to a release. Wouldn't it make more sense to populate albums based on the tracks, or "recorded works" and not the other way around? A definitive "recording body" ? Releases tend not to share much meta-data with the first album other than information that really could come from the track meta-data anyway. They often have different mastering engineers, release dates, formats, and sometimes even labels.

        For one manifestation to be denormalized as a convenience, it should conveniently have meta-data that we are most likely going to use. Since its the track meta-data that tends to re-appear and not the details of any one initial release, it seems to me it's worth bringing track information up for convenience instead of repeating album meta-data.

      20. I am not sure I am completely following all the abstract jargon here, being elbows-deep in concrete instances. (-: This process is running on the production system right now, and will complete tonight. What it means is that (effectively) every album will have at least one release. Each album will have a set of tracks; those tracks will be the same as the track listing for the first known release.

        As for abstract tracks, this is what the notion of Song and Composition are intended to address. MusicBrainz’ next-generation schema will have a notion of mix, recording, arrangement, and composition, for users that want to get to that level of detail, but I think that (for now, at least) that is beyond the scope of Freebase. I could be persuaded otherwise, but part of that persuasion would have to be a large cadre of users prepared to enter that data manually, as it’s not readily available anywhere that I’m aware of.

      21. This operation is done. Every album for which we have any release information from MusicBrainz should now have at least one explicit release.


    Discussion is posted in:

    Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

Search Discussions

Related Discussions