GRC Data Intelligence - Surnames/Family names data

Surnames / Family names data This data table, built by GRC Data Intelligence, has been created by processing real world data files and counting the occurrences of each surname/family name. As names can be very diverse, the file has been cleaned to exclude obvious errors and names containing initials, given names or additions such as seniority indicators (Sr, Jr. etc.); but as it has been created from real world data it may contain real world errors. To reduce errors, only those names found five or more times in real world data files are released. The file contains 707897 records, created from analysing over 39.9 million data records from over 410 sources The data contains: • Unique record number • Country information (ISO 3166 codes). This applies to the country in which the name was found, not where the name originated • The string as found. Names are very diverse and during cleaning strings are assumed correct unless clearly incorrect • A corrected name. Corrections are only made when there is a clear indication that a name is incorrect and the correct version is obvious. Thus is is never correct to assume that Smyth should be Smith. However, Andr? can clearly be corrected to André • A count to show the number of times this surname/family name has been found in real world data. The higher this number, the more likely that the string is a name. Only strings found 5 or more times are released in this file. The table attempts to capture what is found in the real world and to lend itself to processing according to most uses. Each record is unique by country and name as written (including casing). You can therefore expect to find each name multiple times within the file in this way: Country Name Count GB SMITH 5231 GB Smith 7563 GB smith 17 GB JONES 3876 GB Jones 6252 GB jones 5 ... Coverage figures are here. Strings which are known not to be surnames/family names, or which are or include given names, or which contain additional information such as forms of addresses or seniority indicators, are excluded from the release file. For full information, please refer to the documentation. If you have any questions about this file, please contact us. Sample View a sample of 200 records from the file here. Coverage View the coverage of this version here. The numbers released are given in the "Found >5 times" column. Full file documentation is available here. Formats Data is held in Microsoft Visual FoxPro format, but can be provided also in these formats: FoxPro 2.x (dBase III+), tab delimited text, pipe delimited text, fixed column width text, and Excel. Prices This whole file is available at the price of only EUR 495. If you have any questions regarding the file, please contact us. This data is offered on a royalty-free basis for use in any way you wish, with this important proviso that the data may not be copied or distributed in any way whatsoever when it can, in normal use, be accessed by other users. In other words, if you would like to use this data in your software package, that is allowed provided users cannot get at, or export, the data themselves. You will be asked to agree to our terms and conditions when purchasing. Our terms, conditions and licensing structure can be viewed here. To order To purchase these files, please contact us by e-mail. Delivery will be by e-mail. If you have any questions about any of our products, or would like to order them, please contact us.

GRC Data Intelligence Expertise in Global Data

Surnames Documentation Counts Sample Terms and Conditions