Discussions on File Format
Start a New Discussion
-
-
This type is being used as a kind of catch-all for everything from low-level encodings which aren't file formats at all to top-level container formats. As an example, mu-law, PCM, ADPCM, G.721, etc are all audio encodings or families of audio encodings which are always, at least these days, wrapped in some container format before being written to disk, so I don't consider them to be file formats at all.
It feels messy, but perhaps the flexibility outweighs the value of finer grained modeling. What do others think?
-
I also think this is a bit too messy. For example, many of the entries are in fact format families (e.g. PDF) and so do not provide an easy way of identifying different versions of the format (e.g. PDF 1.5 does not have a Freebase URI). Perhaps we could add a 'Bitstream Encoding' entity (which may be embedded in another, or may act as a stand-alone file) and then point the current Format entities at these. For example:
PDF . has_version . [Bitstream Encoding: PDF 1.6]
But perhaps this will turn things people think of as Formats into Encodings and they won't be able to find them?
-
Cross-posting to the domain so that this discussion is more visible.
-
sandos's type Serialization is probably relevant here to help distinguish format models from their serializations.
Versions of formats should probably just be modeled as separate topics, particularly if there's any significant compatibility issues.
The extended_from property can be used to link things together, but it's not always 100% accurate for families of things which are more related by branding or continuity than technical compatibility. Not sure how finely this needs to be modeled though...
-
Sorry for the delay in replying - I didn't get any notification email, and I'm not sure why as I've allowed it in my profile.
I agree that we should distinguish between models and their serialisations, although sandos's type still does not distinguish between specific encodings (e.g. XML rather than XML 1.0). That would imply three levels: Specification/Model, File Format, Serialisation/Encoding/Version.
I'm not quite sure how to align the Model/Specification with the more common file formats. For RDF, there is a well-specified model and a range of encodings. For things like 'Images', the model is less well specified (rather, the core concepts are well-specified, but there are a lot of additional complications that vary between formats), so the mapping is not clear.
I agree that version of formats should be separate topics (as I would like URIs for each), but I'm not sure if I would prefer different instances of the 'File Format' type, or a new 'Serialisation' or 'Encoding' type. I suspect that, for now, it might work better if we augment the File Format schema and use that for both file format families and specific encodings. We could add fields to distinguish them, and let things develop for a while before we attempt to prise them apart into separate entities.
The semantics of 'extended_from' are not clear to me. It has no description in the File Format type, and does not indicate how it should be used. For me, it is critical to be able to distinguish between direct super/subset relationships of encodings (i.e. XHTML is also XML, but a more specific subset - technical compatibility if I understand you correctly) and other relationships (e.g. HTML5 is a later version of HTML4). Without clear guidance on what extended_from means, the data will become chaotic.
I would really like to work out how to move this forward. It seems that the user who created the File Format type (superkurt) is long gone. How do we help make it better? Who should we talk to?
-
-
-
Please reciprocate the created_by, written_by, and read_by properties onto Software Developer and Software respectively so that the information is visible from that end of things as well.
-