Share This
table started by
robert for the Data World Commons
'Mass Data Operation' tracks the large scale data tasks carried out by the data team.
Add More Topics
Save this view to a base, or just for yourself.
23,497 Mass Data Operation topics matching:
Filter this Collection|
|
|
||||
|---|---|---|---|---|---|
| x name | x image | x Started Operation | x Operator | x Ended Operation | x article |
| x Television infoboxes 29 Nov 2006 (example) | Nov 29, 2006 | tristan | Nov 29, 2006 |
This is an example -- not an actual data load
Spreadsheet had 4,353 entries.
31 /tv/shows existed before load
4,388 existed after load
4,357 added during load time.
I believe that the discrepancy of 4 were end user additions during the load.
|
|
| x Location Refactoring 5 January 2007 | Jan 5, 2007 | colin | Jan 5, 2007 |
Rename "Province" to "Canadian Province"
Rename "Census Designated Place" to "US Census Designated Place"
Co-Type all "US County"
Add and co-type "US City/Town" type
Add "cities" and "counties" to "US State"
Remove "County" types
Remove "State"...
|
|
| x Mass typing (phase 1) | |||||
| x Basketball Domain Creation 15 January 2007 | Jan 15, 2007 | danm | Jan 15, 2007 |
Schemas created by hand and loaded via iLoader utility. Information source is public domain information about NBA players and basketball in general.
|
|
| x Multiple Domain creation 19 January 2007 | Jan 19, 2007 | jeff | Jan 19, 2007 |
Schemas for the domains book, geography, meteorology, tv, broadcast, and transportation added; schemas for cheese and beverages added to the food domain. Schemas were uploaded using the iloader tool.
|
|
| x Created 7bit aliases for compound unicode names 19 Jan 2007 | Jan 19, 2007 | jamie | Jan 20, 2007 |
Operation reviewed all guids for a display names that contained a compound unicode character. The name was recomputed dropping any combining characters. Back quotes were also removed during this operation. The "cleaned" name was then added to...
|
|
| x Typed some Topics for the new domains | Jan 20, 2007 | jg | Jan 21, 2007 |
typed 475 tv_network from Infobox TV channel
and 584 from Infobox Network.
typed 2094 tv_show from Infobox television
typed 139 /transportation/road from Infobox Interstate
types 74 cheeses from Template:Cheese
typed 59 teas from Infobox Tea
|
|
| x Added webpages for tv networks and companies | Jan 21, 2007 | jg | Jan 21, 2007 |
added ~5000 company webpages from Infobox Company
|
|
| x Added schema to business domain | Jan 26, 2007 | jeff | Jan 26, 2007 |
Schema definition added for business domain (including adding stock exchange to the finance domain and the "employment history" property to /common/person) using iloader.
|
|
| x Added company stock symbols |
Added 1439 stock symbols for US Companies from:
Template:Nasdaq
Template:Nyse
Template:NYSE
Template:Amex
Template:NASDAQ
Template:AMEX
|
||||
| x Added film release years and IMDB references via article text extraction | Jan 31, 2007 2:00am | jamie | Jan 31, 2007 3:30am |
Approx 17k films now have release dates
|
|
| x Typed 36458 Topics via MegaType Process | Feb 1, 2007 9:02:00pm | jamie | Feb 2, 2007 8:10:00am |
Infobox information with natural language verification.
13 id: /american_football/football_coach
48 id: /american_football/football_player
21 id: /aviation/airliner_accident
4033 id: /aviation/airport
2 id: /aviation...
|
|
| x 484 City/Town given new names or aliases | Feb 5, 2007 11:18:57pm | jamie | Feb 5, 2007 11:19:32pm |
Hand selected names based on USBGN used to rename certain cities or used as aliases. When replacing the name, if the existing name was not a shortened version of the new name and the old name provided useful context, the existing name was moved to...
|
|
| x Added schemas for education and visual art | Feb 2, 2007 | jeff | Feb 2, 2007 |
Uploaded schemas (via iloader) for education and visual art; added adapted and adapted_work types to the common domain.
|
|
| x Retyped alternate albums as releases |
All instances of /music/album which had the orig_artist property rather than artist, which is to say that they were believed to be releases of other albums, were retyped as instances of /music/release instead, with their properties reset accordingly....
|
||||
| x Added restaurant data | Feb 20, 2007 8:00pm | darin | Feb 21, 2007 1:00am |
added ~100,000 restaurants and locations, reconciled with existing data typed 1,100 existing topics as restaurants created new dining domain for restaurant and cuisine types created business_chain and retail_location types in the business domain
|
|
| x Deleted 4488 duplicate birth dates | Feb 21, 2007 11:00pm | jamie | Feb 21, 2007 11:40pm |
/people/person/date_of_birth is not a unique property. About 4700 people topics have two or more birthdays listed.
This is the beginning of an effort to remove duplicate birthdays so /people/person/date_of_birth can be converted to a unique...
|
|
| x Education mass typing operation | Feb 20, 2007 1:46pm | jamie | Feb 20, 2007 1:58pm |
(using account wp_typer) 103 id: /education/fraternity_sorority - inserted 2581 id: /education/institution - inserted 2461 id: /education/school - inserted 183 id: /education/school...
|
|
| x Composer and lyricist properties moved from track to song | Mar 5, 2007 | crism | Mar 5, 2007 |
The composer property was added to /music/song.All existing uses of the composer or lyricist properties of /music/track were moved:if the track was known to be a recording of a song, the composer and lyricist property values were moved to the...
|
|
| x Extended medical schema | Mar 6, 2007 | Mar 6, 2007 |
Extended the medical domain to connect symptoms to diseases, expand information about drugs, and include treatments and medical trials.
|
||
| x Added US City/Town co-type to cities that were missed the first time | Mar 6, 2007 12:30am | colin | Mar 6, 2007 12:45am | ||
| x Seinfeld episode infobox | Mar 12, 2007 12:10pm | colin | Mar 12, 2007 12:25pm |
Loaded data from the seinfeld infoboxes.
|
|
| x Uplift of non-English Latin-character display names | Mar 12, 2007 6:30pm | crism | Mar 12, 2007 9:07pm |
For all topics that had no English display name, but which had display names in other languages, if the foreign display name was at least 80% Latin characters (including unaccented Latin letters as well as Latin letters with eastern or western...
|
|
| x 94739 Geolocations Added | Mar 19, 2007 10:25:54pm | jamie | Mar 20, 2007 |
Initial run of the geocode_bot. This bot will run nightly to cover new /location/mailing_addresses that have been created.
|
|
| x Glacier Infoboxes | jeff | Mar 20, 2007 12:10pm |
Uploaded data from Wikipedia "Infobox Glacier", including glaciers, locations, type, status, and terminus, but not numeric values. Used account mwcl_infobox. |
||
| x 14,000 CityTowns | Mar 20, 2007 4:40pm | jamie | Mar 20, 2007 5:30pm |
Using account: mwcl_wpgeoNon-US locations. Data was cleaned, selected and organized from: http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Wikipedia-World/en |
|
| x Lake Infobox Load | Mar 21, 2007 9:15am | colin | Mar 21, 2007 9:17am | ||
| x Artist Infobox Load | Mar 21, 2007 10:00am | colin | Mar 21, 2007 10:02am | ||
| x Painting Infobox Load | Mar 21, 2007 10:07am | colin | Mar 21, 2007 12:00am | ||
| x NBA Basketball Teams and Rosters | Mar 21, 2007 1:15pm | danm | Mar 21, 2007 1:47pm |
Added NBA team information such as city, coach, date founded, division, conference, league. This process created one new person and typed an additional 28 people as basketball coaches. Also added current rosters to each team. This process created 10...
|
|
| x Writer Infobox Loaded | Mar 21, 2007 3:50pm | colin | Mar 21, 2007 3:55pm | ||
| x Mountain infobox | Mar 29, 2007 4:30pm | colin | Mar 29, 2007 4:32pm | ||
| x Beer Infobox | Mar 21, 2007 5:40pm | colin | Mar 21, 2007 5:42pm | ||
| x Brewery infobox | Mar 21, 2007 6:43pm | colin | Mar 21, 2007 6:45pm | ||
| x 2861 non-US CityTowns | Mar 22, 2007 9:10:51pm | jamie | Mar 22, 2007 9:12:30pm |
Added as User: mwcl_geonames Prereconcilied entries from GeoNames.org
|
|
| x Import missing MusicBrainz albums and tracks | Mar 23, 2007 1:03am | crism | Mar 23, 2007 2:00am |
Added 16,082 MusicBrainz albums and associated tracks based on MusicBrainz attributes field of {0,100}; two-valued attributes field was skipped during initial load.
|
|
| x Theater Domain created | Mar 23, 2007 5:42pm | jeff | Mar 23, 2007 5:42pm |
Created the theater domain, and loaded the schema via iloader.
|
|
| x Capitals from Country Infobox | Mar 23, 2007 1:20pm | colin | Mar 23, 2007 1:25pm | ||
| x state capitals with us state infobox | Mar 23, 2007 1:50pm | colin | Mar 23, 2007 1:55pm | ||
| x Imported "Infobox Play" data | Mar 23, 2007 | jeff | Mar 23, 2007 |
Imported play and playwright data from WP infoboxes.
|
|
| x more country capitols | Mar 23, 2007 5:00pm | colin | Mar 23, 2007 5:05pm | ||
| x 111957 objects typed /location/geocode | Mar 24, 2007 10:00pm | jamie | Mar 24, 2007 11:00pm |
user: geocode_botTyped objects which were created during the last geocode_bot, mwcl_geonames, mwcl_wpgeo operations. These objects were created and given lat/lon properties, but failed to get typed /location/geocode.
|
|
| x Uploaded Data for Tropical Cyclone Categories | Mar 27, 2007 1:47pm | jeff | Mar 27, 2007 1:47pm |
Uploaded a spreadsheet of tropical cyclone category information, including wind force, corresponding Beaufort numbers, and meteorological services. Data was derived from the Wikipedia page Tropical Cyclone Scales. Uploaded using the Metaweb...
|
|
| x Move death-related properties from person to deceased person | Mar 28, 2007 5:26am | crism | Mar 28, 2007 5:47am |
Given 100,000 uses of /people/person/date_of_death, it did not make sense to delete and recreate these property values. Accordingly, the death-related properties of /people/person were carefully moved to /people/deceased_person, taking their hints...
|
|
| x Musical infoboxes uploaded | Mar 29, 2007 | jeff | Mar 29, 2007 |
Uploaded play, composer, lyricist, bookwriter, actors, and directors from Infobox Musical and Infobox Musical 2; did not link actors and directors to performances. Used mwcl_infobox account.
|
|
| x Airports & Airport Codes | Mar 29, 2007 7:00pm | colin | Mar 29, 2007 8:00pm | ||
| x Created chemisty domain | typelibrarian | Mar 30, 2007 1:05pm |
Chemistry domain and schema created using iloader.
|
||
| x Typed chemical elements | jeff | Mar 30, 2007 |
Typed all chemical elements and uploaded their CAS ids using mwcl_infobox.
|
||
| x Baseball Player data from infoboxes | Apr 1, 2007 10:00pm | dm_wikipedia_loader | Apr 1, 2007 11:00pm |
Added 75 baseball players from Wikipedia infobox templates.
|
|
| x Tennis player data from Wikipedia infoboxes | Apr 2, 2007 1:29pm | dm_wikipedia_loader | Apr 2, 2007 1:31pm |
Added roughly 270 tennis players extracted from Wikipedia infobox templates. Simple load included birth dates and typed instances as person and tennis player.
|
|
| x Moved "chemist" to chemistry domain | jeff | Apr 3, 2007 9:10am |
Moved the chemist type from /science to /chemistry.
|
||
| x Opera domain created | jeff | Apr 5, 2007 |
Created the opera domain and schema via schemaloader.
|
||
| x Mass typing for opera data | jeff | Apr 5, 2007 |
Mass typed existing Wikipedia articles as opera composer, librettist, opera director, opera company, and opera house. Used the account mwcl_infobox.
|
||
| x Music albums merged | Apr 6, 2007 | kurt | Apr 2008 |
15285 /music/album instances were merged.
|
|
| x Book Infobox | Apr 11, 2007 2:55pm | colin | Apr 11, 2007 3:10pm | ||
| x Television Episode Infoboxes | Apr 11, 2007 5:25pm | colin | Apr 11, 2007 5:35pm | ||
| x Wikipedia image import |
|
Apr 12, 2007 7:39:12am | jg | Apr 20, 2007 7:36:53pm |
324560 new images were imported from wikipedia commons and wikipedia en.
373412 topics were given new images. approximately 84309 topics already had images from prior loads.The name of the image, if provided, is derived from the caption...
|
| x Skyscrapers from around the world | Apr 15, 2007 12:25pm | Apr 2008 |
User 'robert' did the load using the data pusher.
|
||
| x Literature awards schema | jeff | Apr 15, 2007 1:35pm |
Uploaded (via schemaloader) types for literature awards to the book domain.
|
||
| x Re-typed chemical elements | Apr 17, 2007 | jeff | Apr 17, 2007 |
Typed all chemical elements as "chemical element" (again -- data was deleted after last typing) and added CAS IDs, using Metaweb Importer. Used account mwcl_infobox.
|
|