How does python define binary?

© Copyright 2001-2022, Python Software Foundation.
This page is licensed under the Python Software Foundation License Version 2.
Examples, recipes, and other code in the documentation are additionally licensed under the Zero Clause BSD License.
See History and License for more information.

The Python Software Foundation is a non-profit corporation. Please donate.

Last updated on Oct 09, 2022. Found a bug?
Created using Sphinx 3.4.3.

A binary variable is a categorical variable that can only take one of two values, usually represented as a Boolean — True or False — or an integer variable — 0 or 1

You should already know:

Basic Python — Learn Python and Data Science concepts interactively on Dataquest.

A binary variable is a categorical variable that can only take one of two values, usually represented as a Boolean — True or False — or an integer variable — 0 or 1 — where $0$ typically indicates that the attribute is absent, and $1$ indicates that it is present.

Some examples of binary variables, i.e. attributes, are:

  • Smoking is a binary variable with only two possible values: yes or no
  • A medical test has two possible outcomes: positive or negative
  • Gender is traditionally described as male or female
  • Health status can be defined as diseased or healthy
  • Company types may have two values: private or public
  • E-mails can be assigned into two categories: spam or not
  • Credit card transactions can be fraud or not

In some applications, it may be useful to construct a binary variable from other types of data. If you can turn a non-binary attribute into only two categories, you have a binary variable. For example, the numerical variable of age can be divided into two groups: 'less than 30' or 'equal or greater than 30'.

Datasets used in machine learning applications have more likely binary variables. Some applications such as medical diagnoses, spam analysis, facial recognition, and financial fraud detection have binary variables.

In Python, the boolean data type is the binary variable and defined as $True$ or $False$.

# Boolen data type
x = True
y = False
print(type(x), type(y))

Out:

<class 'bool'> <class 'bool'>

Additionally, the bool() function converts the value of an object to a boolean value. This function returns $True$ for all values except the following values:

  • Empty objects (list, tuple, string, dictionary)
  • Zero number (0, 0.0, 0j)
  • None value

print("Boolean value of an empty list is ", bool([]))
print("Boolean value of zero is ", bool(0))
print("Boolean value of number 10 is", bool(10))
print("Boolean value of an empty string is", bool(''))
print("Boolean value of a string is", bool('string'))

Out:

Boolean value of an empty list is  False
Boolean value of zero is  False
Boolean value of number 10 is True
Boolean value of an empty string is False
Boolean value of a string is True

From the statsmodels library, a real dataset named birthwt about 'Risk Factors Associated with Low Infant Birth Weight' will be imported to observe binary variables.

import statsmodels.api as sm
dataset1 = sm.datasets.get_rdataset(dataname='birthwt', package='MASS')
df1 = dataset1.data

df1.head()

From the help file, description of the dataset obtained by dataset1.__doc__ code is given below.

  • low : an indicator of whether the birth weight is less than 2.5kg
  • age : mother’s age in year
  • lwt : mother’s weight in pounds at last menstrual period
  • race : mother’s race (1 = white, 2 = black, white = other)
  • smoke : smoking status during pregnancy
  • ptl : number of previous premature labours
  • ht : history of hypertension
  • ui : presence of uterine irritability
  • ftv : number of physician visits during the first trimester
  • bwt : birth weight in grams

As can be easily learned from dataset description, low, smoke, and ui attributes are the binary variables. In Python, "value_counts()" function gives the counts of unique values in the variable.

# find counts of the variables
df1['smoke'].value_counts()

Out:

0    115
1     74
Name: smoke, dtype: int64

In the following example, a numerical variable, age, will be converted to a binary variable.

# convert a numerical variable to binary variable
df1['new_age'] = df1['age'] > 30
df1['new_age'].astype('bool')

print('Type of the new variable:\n', type(df1['new_age'].iloc[0]), '\n')
print('Value Counts of the new variable:\n', df1['new_age'].value_counts())

Out:

Type of the new variable:
 <class 'numpy.bool_'> 

Value Counts of the new variable:
 False    169
True      20
Name: new_age, dtype: int64

How do you define a binary number in Python?

In Python, using binary numbers takes a few more steps than using decimal numbers. When you enter a binary number, start with the prefix '0b' (that's a zero followed by a minuscule b). 0b11 is the same as binary 11, which equates to a decimal 3. It's not hard, but it is extra work.

How does Python read binary data?

The open() function opens a file in text format by default. To open a file in binary format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format for reading, while the "wb" mode opens the file in binary format for writing. Unlike text files, binary files are not human-readable.

How does Python store binary values?

To assign value in binary format to a variable, we use the 0b suffix. It tells the compiler that the value (suffixed with 0b) is a binary value and assigns it to the variable. Note: To print value in binary format, we use bin() function.

What is a binary variable in Python?

A binary variable is a categorical variable that can only take one of two values, usually represented as a Boolean — True or False — or an integer variable — 0 or 1 — where typically indicates that the attribute is absent, and indicates that it is present.