Role of location of household and its socio-economic status on energy consumption dynamics in rural Nepal: a categorical data analysis

Received Apr 5, 2019 Revised Feb 19, 2020 Accepted Mar 20, 2020 This study is based on data collected from two sample surveys. They are namely survey of 300 households of national grid energy users and 400 households of biogas users. It was conducted in three different rural settings of Nepal. The responses to questions were classified into multiple choice options. This generated categorical data and reduced ambiguity and confusion between interviewer and interviewee. Such data were classified into ordinal scale and modelled. As the dependent variable had more than two categories, polytomous and not dichotomous models are developed and fitted. Ten different hypotheses assessing and measuring the energy consumption dynamics are tested. Values of parameters of these model and odds ratio are used in quantifying the impact of change with respect to energy consumption. The variables considered were namely time spent in the collection of firewood, type of house, amount of firewood saved, time saved, employer and school located within 15 min distance. Such data based studies are very crucial for country like Nepal which lacks a strong backbone of accurate and regularly updated official records. They can be generalized to other counties of Asia and Africa. Such results provide guidelines to policy makers and planners regarding formulation of realistic energy policies for such countries.


INTRODUCTION
In developed countries immense energy sources are needed to keep pace with the rapid development. Whereas in developing countries marked with long hours of power outages, need for continuous electricity is important for social and economic development. There is a rising cost of exploitation of ever depleting fossil fuel reserves. This is associated through pollution with an adverse impact on the ambient environment. So there is a growing urgency to switch over to alternative and renewable sources of energy.
Total energy consumption of Nepal in the year 2008/09 was about 9.3 million tons of oil equivalents (401 million GJ), out of which 87 percent were derived from traditional resources, 12 percent from commercial sources and less than 1 percent from the alternative sources [1]. Here traditional energy resources include fuel wood from forests and tree resources, agricultural residues coming from agricultural crops and animal dung in the dry form. Energy resources primarily coming from coal, grid electricity and petroleum products are termed as commercial whereas biogas, solar power, wind and micro level hydropower are categorized into the alternative energy resources in Nepal. Nepal has an agriculture based economy where cattle are kept for their draft power and milk. Biogas generates energy from the dung produced by farm animals and also from agricultural wastes. Thus biogas as an alternative energy source is very suitable for Nepal. The potential of producing biogas is about 1.9 million plants out of which 57% in terrain, 43% in hills and mountains. Biogas plants were promoted by Government of Nepal in agriculture year 1974/75 as a part of special program. This program installed 250 plants in different part of Nepal under the supervision of governmental and non governmental agencies. As a part of alternative energy year 2009/2010 government of Nepal planned to install 100,000 biogas plants in 70 districts [1]. There were 201775 biogas plants in 2009, installed in various districts of Nepal [1, pg 60]. According to 2011 census, 131,596 households use biogas for cooking, where 14.5% households are in urban areas and 85.5% are in rural areas. This is 2.52% of the total household. Households depending on wood/firewood for cooking are 63.99%. The use of Kerosene for cooking is in 1.02% of the households, whereas LPG gas is used in 21.03% for the same purpose. Kerosene is used in 18.28% households for cooking purposes. Similarly households using electricity as a source of lighting is 67.2% [2].
Evidence based studies are very important for many countries as they lack the backbone of strong and good quality official data. Special statistical techniques have to be applied and developed for such countries with limited and scarce data. Several factors playing a critical role in energy consumption dynamics need to be identified and quantified. Here, there are several intangible advantages of energy use especially in rural areas. These outweigh directly attributable advantages and benefits.
In this paper the impact of various factors governing the energy consumption dynamics of rural households is quantified. Categorical data are analysed and polytomous models are developed. The work is novel as unlike other papers the use of statistics is not superficial. Statistical methods are used in generation of categorical data and its in-depth analysis. This approach is unique to energy research problems. This method is useful in situations where we have lack of accurate measurement instruments. It can also be useful in situations where due to lack of awareness, exact answers cannot be furnished, but a multiple choice option can be chosen correctly. Here dependent variables are classified in more than two categories and are hence polytomous and not dichotomous. We interpret the parameters of these models. We also calculate odds ratio and odds in favour for quantification of impact. Categorizing data into different groups which can be later reduced to ordinal data has reduced the chances of error due to ambiguous response. Thus the dynamics of change of variables related to energy consumption of 700 households such as time spent in the collection of firewood, type of house, amount of firewood saved, time saved, employer and school located within 15 min distance are minutely analyzed. The data used here is based on sample survey of 300 households of normal users which are national grid energy users and 400 households of biogas users.
Devkota [3] has discussed and developed several statistical methods for countries with limited and scarce data demographic data. This approach has wide applicability in various interdisciplinary fields. Saleh et al. [4] implemented statistical optimization of parameters and conditions for reduction of consumed chemicals and reagents in experimental works. Kanak et al. [5] applied statistics and artificial intelligence and developed models with aim to reduce the risk of incidence of blood related anemia. Statistical analysis was used in evaluation of corrosion resistant steel bars in sustainable building construction by Imam et al. [6]. Rui et al. [7] have analysed causes, characteristics and consequences of tunnel fire accidents in China using statistical methods. Pacheco et al. [8] analyzed the compressive strength of three Portuguese cement brand using probabilistic modelling. Bhattacharyya [9] studied the access of energy to India's poor. Whereas an overview of energy consumption pattern by available data and the analysis of some relevant aspects of energy policy in rural China are presented by Zhang et al. [10]. Petrides and Furnham [11] analyzed several dimensions human's emotional intelligence with exploratory factor analysis. Similarly Garcia et al. [12] used multivariate statistics is used to estimate the theoretical, technical and economic potentials of biomasses for bioenergy production. They used Hierarchial cluster agglomerates and Principle components. Sun et al. [13] used stochastic processes to select from large number of scenarios for representing the variability of operating points, a suitable scenario. This was used in transmission network expansion planning. Xu et al. [14] used multivariate statistical regression model to accurately predict the output power of photovoltaic grid-connected power generation. This section is followed by section 2 on theoretical background and hypothesis titled research methods which is followed by section 3 on results and discussion. This is followed by a section 4 titled conclusion.

RESEARCH METHOD
Categorical data is the only means of getting accurate data in various studies. Categorizing data into several groups reduces the chances of having ambiguous and erroneous data. This is especially true for countries with limited data. Due to lack of awareness among various stakeholders and a very limited (often unreliable) database, categorical data minimize the ambiguity of response between interviewee and interviewer. These data are also useful when the measurement instruments of data collection are not very precise and accurate. Identification of all the possible response to a question and then putting them into several categories reduces the chances of generating faulty data due to uncertainty and vagueness. These data can be classified in such a way that the resulting data can be reduced to put on ordinal scale. Due to large sample size the ordinal data can be treated as continuous data by central limit theorem. As the dependent variable takes more than two values, polytomous models instead dichotomous models are suitable.

Measures
The probability distributions of attributes I and J can be cross tabulated into contingency tables in the following manner as shown in Table 1. This cross tabulation gives a better overview of the data. Special case of this I×J contingency table is a 2×2 contingency table, which is given in Table 2. Odds ratio can be used in quantifying the impact of a technique. This is done by calculating the ratio of two conditional probabilities of different response values under the same condition. When the response is dichotomous then the odds are [15]: , j=1, 2.
Based on product law of compound events; When the response variable is not dichotomous that is it has more than two options, then it is called polytomous. So polytomous response models have more than two categories, say more than two categories in row or column or both. This can be explained by multinomial probability density function. Here: where i = 1, 2, …I ; j = 1, 2, …J. π̇j is the value of the probability obtained in this way for the jth category of the explanatory variables. This value is called the geometric mean. Back-transformation, eliminating the logarithm, shows its definition: So, polytomous logistic models with one explanatory variable can be written as: Here there is one such equation for each category j of explanatory variable, as well as for each category of response i. This is a set of × linear equations to describe how the multinomial is changing in different categories of the explanatory variable. Because the response categories are compared with a mean following constraints are imposed on the parameters: ∑ = 0, ∑ = 0 ∀ and ∑ = 0 ∀ . The estimates may be obtained by solving I sets of J equations. Polytomous model given in (1) satisfy properties that are given below by (2) and (3): also, as, LHS; Hypothesis, there is dependence between the location and socio-economic status of the house on energy consumption dynamics. Hypothesis 1 : For household living within 15 minutes to the school the dependence on firewood is less (normal users) Hypothesis 2 : For household living within 15 minutes to the employer the dependence on firewood is less (normal users) Hypothesis 3 : Low socio economic status (indicated by the type of house) implies more time spent on the collection of firewood (normal users) Hypothesis 4 : Low socio economic status (indicated by the type of house) implies more kilograms of firewood consumed (normal users) Hypothesis 5 : Low socio economic status (indicated by the type of house) implies less litters of kerosene consumed (normal users) Hypothesis 6 : Low socio economic status of biogas owners (indicated by the type of house) implies more time spent on the collection of firewood before the installation of plant Hypothesis 7 : Low socio economic status of biogas owners (indicated by the type of house) implies more time spent on the collection of firewood after the installation of plant Hypothesis 8 : Low socioeconomic status (indicated by type of house) implies more time saved after construction of biogas plant Hypothesis 9 : Low socioeconomic status (indicated by type of house) implies more firewood saved after a switch over to biogas plant for cooking Hypothesis 10 : More time spent on collection of firewood before the construction of biogas plant implies "relatively" more time spent in after the construction of biogas plant

Data
The primary data collected for this study are from 700 households. They are obtained from two sample surveys of 300 and 400 households of normal energy users and biogas consumers respectively. These are households inhabiting in different regions of Nepal. In these two surveys the basic set of questions was the same, only some questions to be asked from these two different categories of respondents were modified. Pre-test of questionnaire and training to the interviewer ensured the quality of collected data. The possible response was provided as a multiple choice option with answers classified on an ordinal scale. The details of variables are given in Table 3. This Table 3 shows that variables analyzed here are categorical data classified on ordinal scale. As the sample size is large this ordinal data can be treated as a continuous data by using the central limit theorem.

Result
The responses to questions in the questionnaire were structured. All the possible answers to each question were properly worked out during the pre-test. It was mentioned as a multiple choice option. This resulted in a categorical data that could be classified on ordinal scale. For example as we see from Table 3, the 'time spent in the collection of firewood' for biogas users is classified into six categories. According to the amount of time devoted for the collection of firewood, these categories are labelled as 0, 1, 2, 3, 4, 5.
Here 0 stands for no time spent whereas 5 denote maximum time of 1-2 hours spent. These numbers are on ordinal scale as these values signify the amount of time spent. As seen from Table 3, this holds true for national grid energy users. Same is true for the attribute 'type of house' labelled as 1, 2, 3 and 4 for both biogas users and grid energy users. Here household with concrete house with label 1 is highest in the socioeconomic status whereas the mud house labelled 4 ranks lowest in the socioeconomic status. For the attributes 'amount of firewood saved and amount of time saved', the values labels are 0, 1, 2 and 3, 1, 2, 3 and 4 respectively. These numbers are also classified on ordinal scale going from smallest to highest as the amount of firewood saved and amount of time saved increases. Similarly, 'distance from employer' and 'distance from school' are also classified on ordinal scale of 1, 2 proportional to the closeness from these places.

199
Then the attributes mentioned in ten hypotheses of section 2 are tested for independence using chi square test of independence of attributes. Table 4 shows these ten hypotheses under the assumption of a true null hypothesis. These results of the hypothesis testing using chi square test of goodness of fit are shown in Table 4. G 2 is a maximum likelihood estimate of the degree of association between these variables. It is useful when the cell frequency is 0. As seen from Table 4, both chi square and G 2 give same results. The results are highly significant for eight out of ten hypotheses shown in Table 4. As seen from Table 4 for national grid energy users there is a great degree of dependence between times spent in the collection of firewood and distance of the household from the school. Similarly there is dependence between time spent in the collection of firewood and distance of the household from the employer. The dependence between socio-economic status (indicated by type of house) and time taken to collect the firewood is highly significant. As seen from Table 4, the dependence between socio-economic status and amount of firewood used is also highly significant. The method of computation of parameters for polytomous model suited to data given in Table 5 is explained in detail under hypothesis 1. The values of the parameters obtained from this model under hypothesis 1 are given in Table 6. Table 7 gives the values of all parameters obtained by fitting model to the cross tabulated data. Here positive estimates indicate over representation and negative estimates indicate under representation of the variables. The strength of these relationships is further described in terms of parameters of models fitted to the data obtained from the contingency tables. Interpretation of these parameters in terms of odds ratio is provided in Table 8. Now the data in Table 5 is modelled using the following methodology:  Table 5 we get the following results; ln (12) The values of the parameters obtained from cross tabulation data provided Table 5 is written in Table 6. From Table 6   However the odds in favour of person having school less than 15 minutes from home and spending no time in the collection of firewood versus spending 45 min-1 hour is 0.5. The odds of not having school within 15 minutes and spending no time in the collection of firewood 12 times more than that of having school with 15 minutes. So the data here does not validate hypothesis 1. The details of parameters obtained fitting polytomous models are given in Table 7. The impact of interrelationship between these variables given in Table 7 is summarized in terms of odds ratio in Table 8.  Socioeconomic status (indicated by type of house) Versus time taken to collect the firewood per day before biogas plant 3 times more in favor of people with mud houses than households with concrete houses for spending more than 60 minutes in the collection of firewood than no time in the collection of firewood Odds in favor tilted towards households living in mud housesthis is indicator of energy poverty 7 Socioeconomic status (indicated by type of house) Versus time taken to collect the firewood per day after biogas plant More than 2 times for people with mud houses than people with concrete houses for spending more than 60 minutes in the collection of firewood than no time in the collection of firewood Odds in favor tilted towards households living in mud housesthis is indicator of reduction in energy poverty 8 Time saved from firewood collection Versus Socioeconomic status (indicated by type of house) after the construction of biogas plant 1.55 times more for people in concrete houses than people in mud houses Odds in favor tilted towards people in concrete houses. This indicated the benefit of biogas in terms of time saved per day is substantial not only in low socioeconomic groups but also in high socioeconomic groups. 9 Biogas (400 households) Amount of firewood saved from biogas plant Versus Socioeconomic status (indicated by type of house) after the construction of biogas plant 1.67 times more for people in concrete houses than people in mud houses.
Odds in favor tilted towards people in concrete houses. This indicated the benefit of biogas in terms of firewood saved per day is substantial not only in low socioeconomic groups but also in high socioeconomic groups. 10 Time spent in the collection of firewood before Versus time taken to collect the firewood after biogas construction More than 1.28 times for households with no time in the collection of firewood after than less than 15 minutes after to spend more than 60 minutes before the plant than no time before the plant.

CONCLUSION
Benefits of use of energy are not only direct but permeate in an intangible manner to several sectors. The factors related to cause and effect of energy use is studied here using statistics. The results of this paper try to fill the knowledge gap by using statistical methods in quantifying the impact of various factors related to energy consumption. The results obtained here are based on two sample surveys of 400 households of biogas consumers and 300 household of normal energy users. The time and energy spent in the collection of firewood has been minutely analyzed in connection with other variables. The categorical data used here are summarized in Table 3. Ten different hypotheses related to energy consumption pattern of normal users and biogas users are tested with the help of this data. In eight of these ten hypotheses the null hypothesis of no dependence is rejected. It is found that location of school, location of employer and socioeconomic status plays a critical role in the energy consumption dynamics for both types of users. Energy poverty is assessed with the help of socioeconomic poverty with these hypotheses. The use of odds ratio has been used in quantifying the impact. The summary of the impact in terms of odds ratio is provided in Table 8. It is seen that the positive effects of biogas are substantial and immensity of impact of this positive effect is shown here with the help of odds ratio. The results obtained here can be generalized to other countries like Nepal existing in Asia and Africa.