本文主要是介绍pandas(二):factorize实现标称型数据数值化,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
一、factorize()
官网说明
This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values. factorize is available as both a top-level function pandas.factorize(), and as a method Series.factorize() and Index.factorize().
pandas.factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None)
Encode input values as an enumerated type or categorical variableParameters:
values:sequence
A 1-D sequence. Sequences that aren’t pandas objects are coerced to ndarrays before factorization.
sort:bool, default False
Sort uniques and shuffle codes to maintain the relationship.
na_sentinel:int or None, default -1
Value to mark “not found”. If None, will not drop the NaN from the uniques of the values.
Changed in version 1.1.2.
size_hint:int, optional
Hint to the hashtable sizer.
Returns
codes:ndarray
An integer ndarray that’s an indexer into uniques.
uniques.take(codes)
will have the same values as values.uniques:ndarray, Index, or Categorical
The unique valid values. When values is Categorical, uniques is a Categorical. When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.
个人理解
factorize函数可以将Series中的标称型数据映射称为一组数字,相同的标称型映射为相同的数字。即它把字符串映射成的数字的规则是先看见的小,后看见的大。意思就是这一列的第一行,必定为0,第二行如果和第一行的取值不同,就为1,否则就是0.以此类推。factorize函数的返回值是一个tuple(元组),元组中包含两个元素。第一个元素是一个array,其中的元素是标称型元素映射为的数字;第二个元素是Index类型,其中的元素是所有标称型元素,没有重复。
python实例
这篇关于pandas(二):factorize实现标称型数据数值化的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!