MusicBrainz Summit / 11 / Session Notes
This page has not been reviewed by our documentation team (more info).
Attendees
- Kuno Woudt (warp)
- Pavan Chander (navap)
- Rob Kaye (ruaok)
- Nikki
- Oliver Charles (ocharles)
- Jamie McDonald (jdamcd)
- Nicolás Tamargo (reosarevok)
- CatCat
- Per Øyvind Øygard (Wizzcat)
- Paul Taylor (ijabz)
- Mathias Kunter (mathiaskunter)
- Hilbert Woudt (monedula)
Sponsor representatives
- musiXmatch: Valerio Paolini
- Last.fm: Adrian Woodhead (massdosage)
- Google/Freebase: Micah Saul (micahsaul)
- Zvooq: Andrey Popp (andreypopp)
- BBC: Dave Evans (djce)
Customer introductions
Last.fm
- They have had a lot of personnel changes over the last few years, but would like to re-establish a relationship with MB
- Are looking to switch to NGS schema by the end of the year
- They would like to use MBIDs internally to make communication between incoming data sets easier
- Will consider sharing partial label feed/data
- Might actually solve their artist disambiguation issue soon..ish
Zvooq
- They are a Spotify competitor in Russia that focuses on music released worldwide
musiXmatch
- They are a lyrics database
- World wide license from Sony, Universal, EMI, Warner, BMG, Kobalt
Freebase
- Freebase is a big data repository of various data sets covering movies, music, sports, people, locations, and others
- http://freebase.com
BBC
- They are looking to finally make the switch to NGS
- Their music news website now uses ws/2
- They outsource their album reviews and MB data entry to Unique Broadcasting Company
Discussions
Friday (Oct 14)
Single sign on & password security
Goals
- Not storing plaintext passwords
- Not having knowable (i.e. reversible) passwords
- Not transmitting passwords in the clear
- Single sign on
Questions
- What specific password issues are we trying to solve?
Discussed proposals
- Implement OpenID
- Using digest authentication (still requires storing and transferring the clear text password)
- Using SSL (requires updating web service libraries)
- Using a separate LDAP server (password no longer in MB database and stored elsewhere, also allows for possible single sign on integration)
Conclusion: Use LDAP and phase in SSL to increase password security. Bonus: LDAP makes single sign on possible.
Saturday (Oct 15)
Cover art archive
- Universal is considering handing over their entire cover art archive to us
- Labels actually don't own copyright on cover art
- There are potential messy legal issues to using cover art
- The Internet Archive functions as a library and can act as a 'cover art shelter' for us
- Possible process:
- A release's MBID can be used to receieve a cover art image
- If you know a release's MBID you can do a GET and receive a cover art image
- Track cover art uploads by user and also use regular voting process
- Images will be provided as a hi-res (~15 MB) and as a low-res (500 px)
Questions
- Does the user have to upload JPEG or can the server transcode?
- What status code will we return when a 'darkened' image exists but we're not allowed to display it?
- If we get cover art from Universal, how do we match each image up with a release?
- How do we handle a release group with many releases (i.e., do we use the same image?)
- How do we handle multiple images (e.g. front, back, obi, liner notes, cd faces, etc.)
Data quality
- See Sunday
Edit system
Goals
- Allow grouping edits together and bulk submitting them
- Allow editing an edit and resubmitting it without impacting the edit queue
- Allow editing via the web service, eventually
Bookbrainz
- Oliver's pet project and testing ground for future MB framework changes
- BB emulates git
- it allows building a stack of changes and then submitting all of them together in one 'commit'
- It takes a snapshot of the data at the time. We don't have that with historical edits so migrating old edits is a problem
Further reading regarding
- wikipedia:Inter-rater_reliability (especially the links at the end)
- http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00074
Web service
- Roll out 3scale and move all commercial users over to a pay2play system with different packages
- Non-commercial users would use the free2play rate-limited system with the option of paying for better access
Audio fingerprinting
- We all hate PUIDs and we need to move forward
- Acoustid looks very promising, it's open source, file oriented, and has strong ties with MB
- http://acoustid.org/
- May be possible to bulk fingerprint some data sources
Concert support
- Do we go with one provider or several?
- Start with Songkick, but stay open to the option of different providers - especially to gain global coverage
- Do we concentrate on future events or archived events?
- Initially link to Songkick for future events
- Create a new setlist entity for past events
- Create a new venue entity
- Need to consider Location, would be useful for artist as well as for events.
Tracks vs recordings (vs works)
- Similar to the remaster issue
- Do we add further levels of abstraction?
- No. We're already saturated with entities. We need better definitions
- ...and we still haven't totally defined works
- Do we count silence as a divergence point?
Service segregation
- Announce the closing of trac (and all its tickets) and the deprecation of subversion
- svn.musicbrainz.org will remain as an interface for the search server
- Consider replacing gitweb with github in a more official capacity
Genres
- A new field that is to be used specifically for genres
- Features: autocomplete, canonical names,
- Micah is offering genre data based on wikipedia
Product offering
- This is not a complete or final model and not official!
- "Drug dealer" model - free the first time, get addicted, pay for easy further access
- Data dumps (twice a week)
- Public $100* *suggested
- CC-NC $250 (Paying for commercial use of NC use data)
- Live data feed ($/mth)
- Twice-weekly $500
- Daily $1500
- Hourly $2500
- Web service calls (flat fee)
- 10K $10
- 25K $20
- 50K $30
- 100K $50
- Virtual machine
- VM + Data $300
- VM + Data + Search $400
- Tagger Affiliate Program
- TBD: Clarification of the scope of the program
- TBD: Web service referral kickbacks
Sunday (Oct 16)
3rd party data set integration
- Lyrics from musiXmatch
- daily updates, but will start with weekly ones
- updates will include all MB/mXm matched lyrics
- lyrics can be added also from edit interface
- How do we best use their lyrics data?
- Solution: Link to mxm via a lyrics icon in the tracklist and a proper link on the recording page
- See also Monday
Tracklist/medium overhaul with video support
- Videos are becoming increasingly common as a music release medium (e.g. iTunes)
- Will require major schema changes and looking at the long term goals of MusicBrainz
- Solution: Table the discussion for now, reopen in a different setting with developers
Group multiple release events (country+date) together
- There is a need to group multiple releases together when each release is the exact same - just released in a different country
- Due to tradition, different countries/regions issue releases on different days of the week
- Solution: Allow multiple release events per release when the label, barcode, and tracklist is the same
Date improvements
- Unknown end date (dead/disbanded, but we don't exactly know when)
- Solution: Add a column to the date table to specifically state that the entity is dead/disbanded, but we don't know when
- Fuzzy dates (16th century composer edge cases)
- Solution: Use a 'century' column
Data quality
- User:Wizzcat/Data_Quality_Extension
- http://wiki.xabbu.net/Data_quality
- Current implementation of data quality has a bad name, is poorly defined, and isn't used
- What do we want to solve?
- Explicitly state that a release has been reviewed/verified
- Solution: +1 / -1 votes that decay in weight over some function of time
- Protect against ignorance (The White Album vs The Beatles)
- Solution: Add a 'Protected' flag (i.e. edits expire by default)
- Measure of completeness
- Solution: "Completed as per liner notes" checkbox that is accessible via the WS
- Explicitly state that a release has been reviewed/verified
- Conclusion: High quality is the protected flag, default quality is default, low quality goes away
Release group attributes
- Currently, 'remix' and 'soundtrack' are at the same meta level as 'album' or 'lp'
- Conclusion: Postponed till a proposal can be drawn up
Reports
- Improve the explanation that is shown at the top of each reports' page
- Improve report flow (e.g. ability to hide items from reports)
- Allow marking an entry as 'done'
- Default report list should filter out all entries marked as done with more than X votes
- Allow viewing the report with the filtered out entries
Site notifications + subscriptions
- List all emails in a site inbox
- Create a dynamic list of subscribed artists with open edits
Testing
- As finances improve, employ a dedicated person that will lead the testing
Pagination
- Filter on release group properties
- Use infinite scroll
- Be able to reorder, add, remove, and sort columns
Medium attributes (12" vinyl, 8 cm CD)
- Switch from a hierarchical tree to attributes
Music dashboard
- What information should we show?
Instrument tree
- Change from a tree to a graph
- Flatten the graph into a tree and allow an instrument to have multiple parents
- Add model support to the instrument tree
- Importing freebase data
- How often do we sync the data?
- How do we reconcile differences in data?
- How often do deletes/merges/changes happen?
- Going forward, if we need a new instrument we would add it to freebase
Universal Music Group International
- "I am very happy to declare Universal's support for MusicBrainz and its community" - Innovation Manager at Universal Music Group International
Release editor
- Default tracklist page shows the advanced view
- For new releases you see the add disc dialog
- The track parser moves into the add disc dialog
- There needs to be a way to reparse from the advanced view
Wiki
- Remove unneeded extensions
- Update to Ubuntu's MediaWiki package
- Get the API working
- Install wiki at /wiki/Article and then redirect to /Article
- Write a wiki test suite
Monday (Oct 17th)
Initial dates on release group
- Last.fm would like to create 'best of the decade' lists and filter out data such as the 2009 re-release of The Beatles
- Currently, release group dates match the date of the earliest release in that group, but in the case of re-releases we often only have data on the modern release and are missing (for example) the original '70s vinyl release
- Solution: Add an editable initial date field at the release group level
- The date field will default to empty because anyone who wants the group date can guess via its earliest release (like MB does now)
musiXmatch
- short description of musiXmatch expectations
- feedback from MB Editors on musiXmatch contributions
- Editors' willingness to help musiXmatch (IRC channel)
- musiXmatch will report unexpected Edit Interface behaviour (for example Split Artists while adding a Release)
- change usernames to make them easily identifiable (add customer name to username)
- provide guidelines for interactions between MB Editors and external Editors
3rd party data set integration
- How do we properly link to different data sets? (e.g., musiXmatch, soundunwound, last.fm, etc.)
- Solution: Build a generic framework that allows us to import any external data set and reconcile it with the data we have
- Use a second "integration database" that contains all raw data from external sources (label feeds, partners, etc.)
- Import data into the main database with a de-duplication script, but do not remove any of the original raw data (this allows further parsing in the future)
- Also look into Google Refine for manual reconciliation: http://code.google.com/p/google-refine/
- A long term goal is to create an editing API that we can gradually open up to our data partners and the ecosystem
- This will allow partners like Zvooq to edit data on their website, but feed the changes back to the rest of the MB ecosystem
Feature prioritization
Feature (votes)
- Edit system (9)
- Group multiple release events together (6)
- Data quality (6)
- 3rd party data set integration (5)
- Single sign on & password security (5)
- Instrument tree (4)
- Genres (4)
- Medium attributes (4)
- Release group attributes (3)
- Music dashboard (2)
- Tracklist/medium overhaul with video support (1)
- Pagination (1)
- Site notifications (1)
- Report improvements (1)
- Date improvements (0)
- Auto-editor elections (0)
- Full classical support