Home >> National Sample Survey Office >> Data Processing Div. >> System Design
System Design

System Design for data processing


As per the requirements of large-scale data processing, the total data processing work is structured over a number of functional steps.  As the data processing is decentralized and a number of people are engaged in the data processing, each step of it is planned beforehand, documented in detail, concepts and definitions are explained, training workshops are organised both centrally and then locally, and mid-course discussions are also held to sort out unforeseen data problems.  Hence, it is a formal system of data processing. The stages of data processing are:

i) Checking of identification and monitoring receipt position.
ii) Hot scrutiny by officers to identify errors, which are recurring in nature, committed by field at the early stage of data collection
iii) Pre-data-entry scrutiny of schedules for manual checking of important fields
iv) Data entry and 100% verification
v) Phase-I   validation (Content Check):  This includes preparation of lists of inconsistencies, checking   of such lists from hard copy of schedules, updation of data files and insertion of records. Number of such checks for each type of schedule varies from 60 to 150.   At this stage :

  • fields  are  checked against  list  of  admissible  values (codes)
  • arithmetic consistency checks including range  checks  and subtotal  checks are carried out on numeric fields, as well as for a group of related fields, within the same block,  and for different blocks.
  • records relating to same item or person are  searched  out between two or more blocks, and  arithmetic/conditional  checks are  carried  out  involving different numeric fields  from  the selected records.
  • person-wise or item-wise information furnished in  different blocks are matched for coverage check.
  • Unit value checks for quantity and value fields
  • duplicate records are detected

v) Phase-II  validation  (Coverage Check) : Under this stage,
  - coverage of data vis-a-vis the directory file, in  respect of each  FSU and SSU are checked.
  - absence of any essential block of data is checked
  - duplication of FSU or SSU data are checked
  - consistency in ID-particulars are checked against directory file

vi)  Phase -III validation (Extreme value checking) : Here abnormal or suspected values or derived index are  searched out and referred to for checking from the  filled-in schedule.

vii) Special data checking are done by officers who are involved in tabulation. A list of doubtful cases are generated and checked by DPCs with filled-in schedules and necessary updation is done in the data files.

viii) Computer-editing or Auto-correction : Under this stage,

  1. necessary  changes  in  the data  in  order  to  make  it internally  consistent,  as  per a set of guidelines,  are  made  without  referring to filled-in schedule.
  2. all subtotals/totals  are  computed  for  each  SSU,  and additional records are generated if necessary.
  3. compatibility of current data are checked against  previous years data in respect of certain important variables.

ix)  Preparation  of  multiplier  files,  i.e.  calculation   of  weighting factors for each Ultimate Stage Unit as per sample design.
x) Preparation of work files, which are extracts from the data to facilitate table generation. Related tables are usually grouped together, and all the data fields required for generating those related tables are extracted into a single workfile.
xi) Tabulation of data:  Typically the number of tables(as per the approved Tabulation Plan)  to be prepared for each schedule varies from 70 to 200. The tables are usually generated sector  x state  x  sex  x  other socio-economic category-wise. Based on the tables generated by DPD, the SDRD prepares subject wise reports which after due approval are subsequently published.

xii) Release of multiplier posted unit level data along with meta data for dissemination through Computer Centre immediately after the release of Key Indicators reports of the survey by SDRD.

Basic steps of data processing