Fiona地理数据引擎Python库

2023-11-05 05:10

本文主要是介绍Fiona地理数据引擎Python库,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

Fiona 1.6.3

Downloads ↓

Fiona reads and writes spatial data files

Fiona is OGR’s neat, nimble, no-nonsense API for Python programmers.

12105921_rjjY.png

Fiona is designed to be simple and dependable. It focuses on reading and writing data in standard Python IO style and relies upon familiar Python types and protocols such as files, dictionaries, mappings, and iterators instead of classes specific to OGR. Fiona can read and write real-world data using multi-layered GIS formats and zipped virtual file systems and integrates readily with other Python GIS packages such as pyproj, Rtree, and Shapely.

For more details, see:

  • Fiona home page

  • Fiona docs and manual

  • Fiona examples

Usage

Collections

Records are read from and written to file-likeCollectionobjects returned from the fiona.open() function.  Records are mappings modeled on the GeoJSON format. They don’t have any spatial methods of their own, so if you want to do anything fancy with them you will probably need Shapely or something like it. Here is an example of using Fiona to read some records from one data file, change their geometry attributes, and write them to a new data file.

import fiona# Register format drivers with a context managerwith fiona.drivers():# Open a file for reading. We'll call this the "source."with fiona.open('tests/data/coutwildrnp.shp') as source:# The file we'll write to, the "sink", must be initialized# with a coordinate system, a format driver name, and# a record schema.  We can get initial values from the open# collection's ``meta`` property and then modify them as# desired.meta = source.metameta['schema']['geometry'] = 'Point'# Open an output file, using the same format driver and# coordinate reference system as the source. The ``meta``# mapping fills in the keyword parameters of fiona.open().with fiona.open('test_write.shp', 'w', **meta) as sink:# Process only the records intersecting a box.for f in source.filter(bbox=(-107.0, 37.0, -105.0, 39.0)):# Get a point on the boundary of the record's# geometry.f['geometry'] = {'type': 'Point','coordinates': f['geometry']['coordinates'][0][0]}# Write the record out.sink.write(f)# The sink's contents are flushed to disk and the file is# closed when its ``with`` block ends. This effectively# executes ``sink.flush(); sink.close()``.# At the end of the ``with fiona.drivers()`` block, context# manager exits and all drivers are de-registered.

The fiona.drivers() function and context manager are new in 1.1. The example above shows the way to use it to register and de-register drivers in a deterministic and efficient way. Code written for Fiona 1.0 will continue to work: opened collections may manage the global driver registry if no other manager is present.

Reading Multilayer data

Collections can also be made from single layers within multilayer files or directories of data. The target layer is specified by name or by its integer index within the file or directory. The fiona.listlayers() function provides an index ordered list of layer names.

with fiona.drivers():for layername in fiona.listlayers('tests/data'):with fiona.open('tests/data', layer=layername) as src:print(layername, len(src))# Output:# (u'coutwildrnp', 67)

Layer can also be specified by index. In this case, layer=0 andlayer='test_uk' specify the same layer in the data file or directory.

with fiona.drivers():for i, layername in enumerate(fiona.listlayers('tests/data')):with fiona.open('tests/data', layer=i) as src:print(i, layername, len(src))# Output:# (0, u'coutwildrnp', 67)

Writing Multilayer data

Multilayer data can be written as well. Layers must be specified by name when writing.

with fiona.drivers():with open('tests/data/cowildrnp.shp') as src:meta = src.metaf = next(src)with fiona.open('/tmp/foo', 'w', layer='bar', **meta) as dst:dst.write(f)print(fiona.listlayers('/tmp/foo'))with fiona.open('/tmp/foo', layer='bar') as src:print(len(src))f = next(src)print(f['geometry']['type'])print(f['properties'])# Output:# [u'bar']# 1# Polygon# OrderedDict([(u'PERIMETER', 1.22107), (u'FEATURE2', None), (u'NAME', u'Mount Naomi Wilderness'), (u'FEATURE1', u'Wilderness'), (u'URL', u'http://www.wilderness.net/index.cfm?fuse=NWPS&sec=wildView&wname=Mount%20Naomi') (u'AGBUR', u'FS'), (u'AREA', 0.0179264), (u'STATE_FIPS', u'49'), (u'WILDRNP020', 332), (u'STATE', u'UT')])

A view of the /tmp/foo directory will confirm the creation of the new files.

$ ls /tmp/foobar.cpg bar.dbf bar.prj bar.shp bar.shx

Collections from archives and virtual file systems

Zip and Tar archives can be treated as virtual filesystems and Collections can be made from paths and layers within them. In other words, Fiona lets you read and write zipped Shapefiles.

with fiona.drivers():for i, layername in enumerate(fiona.listlayers('/',vfs='zip://tests/data/coutwildrnp.zip')):with fiona.open('/',vfs='zip://tests/data/coutwildrnp.zip',layer=i) as src:print(i, layername, len(src))# Output:# (0, u'coutwildrnp', 67)

Fiona CLI

Fiona’s command line interface, named “fio”, is documented at docs/cli.rst. Its fio info pretty prints information about a data file.

$ fio info --indent 2 tests/data/coutwildrnp.shp{"count": 67,"crs": "EPSG:4326","driver": "ESRI Shapefile","bounds": [-113.56424713134766,37.0689811706543,-104.97087097167969,41.99627685546875],"schema": {"geometry": "Polygon","properties": {"PERIMETER": "float:24.15","FEATURE2": "str:80","NAME": "str:80","FEATURE1": "str:80","URL": "str:101","AGBUR": "str:80","AREA": "float:24.15","STATE_FIPS": "str:80","WILDRNP020": "int:10","STATE": "str:80"}}
}

Installation

Fiona requires Python 2.6, 2.7, 3.3, or 3.4 and GDAL/OGR 1.8+. To build from a source distribution you will need a C compiler and GDAL and Python development headers and libraries (libgdal1-dev for Debian/Ubuntu, gdal-dev for CentOS/Fedora).

To build from a repository copy, you will also need Cython to build C sources from the project’s .pyx files. See the project’s requirements-dev.txt file for guidance.

The Kyngchaos GDAL frameworks will satisfy the GDAL/OGR dependency for OS X, as will Homebrew’s GDAL Formula (brew install gdal).

Python Requirements

Fiona depends on the modules six, cligj, argparse, andordereddict (the two latter modules are standard in Python 2.7+). Pip will fetch these requirements for you, but users installing Fiona from a Windows installer must get them separately.

Unix-like systems

Assuming you’re using a virtualenv (if not, skip to the 4th command) and GDAL/OGR libraries, headers, and gdal-config program are installed to well known locations on your system via your system’s package manager (brew install gdal using Homebrew on OS X), installation is this simple.

$ mkdir fiona_env$ virtualenv fiona_env$ source fiona_env/bin/activate(fiona_env)$ pip install Fiona

If gdal-config is not available or if GDAL/OGR headers and libs aren’t installed to a well known location, you must set include dirs, library dirs, and libraries options via the setup.cfg file or setup command line as shown below (using git).

(fiona_env)$ git clone git://github.com/Toblerity/Fiona.git(fiona_env)$ cd Fiona(fiona_env)$ python setup.py build_ext -I/path/to/gdal/include -L/path/to/gdal/lib -lgdal install

Or specify that build options should be provided by a particular gdal-config program.

(fiona_env)$ GDAL_CONFIG=/path/to/gdal-config pip install .

Windows

Binary installers are available athttp://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona and coming eventually to PyPI.

You can download a binary distribution of GDAL from here.  You will also need to download the compiled libraries and headers (include files).

When building from source on Windows, it is important to know that setup.py cannot rely on gdal-config, which is only present on UNIX systems, to discover the locations of header files and libraries that Fiona needs to compile its C extensions. On Windows, these paths need to be provided by the user. You will need to find the include files and the library files for gdal and use setup.py as follows.

$ python setup.py build_ext -I<path to gdal include files> -lgdal_i -L<path to gdal library>$ python setup.py install

Note: The GDAL dll (gdal111.dll) and gdal-data directory need to be in your Windows PATH otherwise Fiona will fail to work.

Development and testing

Building from the source requires Cython. Tests require Nose. If the GDAL/OGR libraries, headers, and gdal-config program are installed to well known locations on your system (via your system’s package manager), you can do this:

(fiona_env)$ git clone git://github.com/Toblerity/Fiona.git
(fiona_env)$ cd Fiona
(fiona_env)$ pip install -e .
(fiona_env)$ nosetests

If you have a non-standard environment, you’ll need to specify the include and lib dirs and GDAL library on the command line:

(fiona_env)$ python setup.py build_ext -I/path/to/gdal/include -L/path/to/gdal/lib -lgdal develop
(fiona_env)$ nosetests

Changes

All issue numbers are relative to https://github.com/Toblerity/Fiona/issues.

1.6.3 (2015-12-22)

  • Daytime has been decreasing in the Northern Hemisphere, but is now increasing again as it should.

  • Non-UTF strings were being passed into OGR functions in some situations and on Windows this would sometimes crash a Python process (#303). Fiona now raises errors derived from UnicodeError when field names or field values can’t be encoded.

1.6.2 (2015-09-22)

  • Providing only PROJ4 representations in the dataset meta property resulted in loss of CRS information when using thefiona.open(…, **src.meta) as dstpattern (#265). This bug has been addressed by adding a crs_wkt item to the meta property and extending thefiona.open()and the collection constructor to look for and prioritize this keyword argument.

1.6.1 (2015-08-12)

  • Bug fix: Fiona now deserializes JSON-encoded string properties provided by the OGR GeoJSON driver (#244, #245, #246).

  • Bug fix: proj4 data was not copied properly into binary distributions due to a typo (#254).

Special thanks to WFMU DJ Liz Berg for the awesome playlist that’s fueling my release sprint. Check it out at http://wfmu.org/playlists/shows/62083. You can’t unhear Love Coffin.

1.6.0 (2015-07-21)

  • Upgrade Cython requirement to 0.22 (#214).

  • New BytesCollection class (#215).

  • Add GDAL’s OpenFileGDB driver to registered drivers (#221).

  • Implement CLI commands as plugins (#228).

  • Raise click.abort instead of calling sys.exit, preventing suprising exits (#236).

1.5.1 (2015-03-19)

  • Restore test data to sdists by fixing MANIFEST.in (#216).

1.5.0 (2015-02-02)

  • Finalize GeoJSON feature sequence options (#174).

  • Fix for reading of datasets that don’t support feature counting (#190).

  • New test dataset (#188).

  • Fix for encoding error (#191).

  • Remove confusing warning (#195).

  • Add data files for binary wheels (#196).

  • Add control over drivers enabled when reading datasets (#203).

  • Use cligj for CLI options involving GeoJSON (#204).

  • Fix fio-info –bounds help (#206).

1.4.8 (2014-11-02)

  • Add missing crs_wkt property as in Rasterio (#182).

1.4.7 (2014-10-28)

  • Fix setting of CRS from EPSG codes (#149).

1.4.6 (2014-10-21)

  • Handle 3D coordinates in bounds() #178.

1.4.5 (2014-10-18)

  • Add –bbox option to fio-cat (#163).

  • Skip geopackage tests if run from an sdist (#167).

  • Add fio-bounds and fio-distrib.

  • Restore fio-dump to working order.

1.4.4 (2014-10-13)

  • Fix accidental requirement on GDAL 1.11 introduced in 1.4.3 (#164).

1.4.3 (2014-10-10)

  • Add support for geopackage format (#160).

  • Add -f and –format aliases for –driver in CLI (#162).

  • Add –version option and env command to CLI.

1.4.2 (2014-10-03)

  • –dst-crs and –src-crs options for fio cat and collect (#159).

1.4.1 (2014-09-30)

  • Fix encoding bug in collection’s __getitem__ (#153).

1.4.0 (2014-09-22)

  • Add fio cat and fio collect commands (#150).

  • Return of Python 2.6 compatibility (#148).

  • Improved CRS support (#149).

1.3.0 (2014-09-17)

  • Add single metadata item accessors to fio inf (#142).

  • Move fio to setuptools entry point (#142).

  • Add fio dump and load commands (#143).

  • Remove fio translate command.

1.2.0 (2014-09-02)

  • Always show property width and precision in schema (#123).

  • Write datetime properties of features (#125).

  • Reset spatial filtering in filter() (#129).

  • Accept datetime.date objects as feature properties (#130).

  • Add slicing to collection iterators (#132).

  • Add geometry object masks to collection iterators (#136).

  • Change source layout to match Shapely and Rasterio (#138).

1.1.6 (2014-07-23)

  • Implement Collection __getitem__() (#112).

  • Leave GDAL finalization to the DLL’s destructor (#113).

  • Add Collection keys(), values(), items(), __contains__() (#114).

  • CRS bug fix (#116).

  • Add fio CLI program.

1.1.5 (2014-05-21)

  • Addition of cpl_errs context manager (#108).

  • Check for NULLs with ‘==’ test instead of ‘is’ (#109).

  • Open auxiliary files with encoding=’utf-8’ in setup for Python 3 (#110).

1.1.4 (2014-04-03)

  • Convert ‘long’ in schemas to ‘int’ (#101).

  • Carefully map Python schema to the possibly munged internal schema (#105).

  • Allow writing of features with geometry: None (#71).

1.1.3 (2014-03-23)

  • Always register all GDAL and OGR drivers when entering the DriverManager context (#80, #92).

  • Skip unsupported field types with a warning (#91).

  • Allow OGR config options to be passed to fiona.drivers() (#90, #93).

  • Add a bounds() function (#100).

  • Turn on GPX driver.

1.1.2 (2014-02-14)

  • Remove collection slice left in dumpgj (#88).

1.1.1 (2014-02-02)

  • Add an interactive file inspector like the one in rasterio.

  • CRS to_string bug fix (#83).

1.1 (2014-01-22)

  • Use a context manager to manage drivers (#78), a backwards compatible but big change. Fiona is now compatible with rasterio and plays better with the osgeo package.

1.0.3 (2014-01-21)

  • Fix serialization of +init projections (#69).

1.0.2 (2013-09-09)

  • Smarter, better test setup (#65, #66, #67).

  • Add type=’Feature’ to records read from a Collection (#68).

  • Skip geometry validation when using GeoJSON driver (#61).

  • Dumpgj file description reports record properties as a list (as in dict.items()) instead of a dict.

1.0.1 (2013-08-16)

  • Allow ordering of written fields and preservation of field order when reading (#57).

1.0 (2013-07-30)

  • Add prop_type() function.

  • Allow UTF-8 encoded paths for Python 2 (#51). For Python 3, paths must always be str, never bytes.

  • Remove encoding from collection.meta, it’s a file creation option only.

  • Support for linking GDAL frameworks (#54).

0.16.1 (2013-07-02)

  • Add listlayers, open, prop_width to __init__py:__all__.

  • Reset reading of OGR layer whenever we ask for a collection iterator (#49).

0.16 (2013-06-24)

  • Add support for writing layers to multi-layer files.

  • Add tests to reach 100% Python code coverage.

0.15 (2013-06-06)

  • Get and set numeric field widths (#42).

  • Add support for multi-layer data sources (#17).

  • Add support for zip and tar virtual filesystems (#45).

  • Add listlayers() function.

  • Add GeoJSON to list of supported formats (#47).

  • Allow selection of layers by index or name.

0.14 (2013-05-04)

  • Add option to add JSON-LD in the dumpgj program.

  • Compare values to six.string_types in Collection constructor.

  • Add encoding to Collection.meta.

  • Document dumpgj in README.

0.13 (2013-04-30)

  • Python 2/3 compatibility in a single package. Pythons 2.6, 2.7, 3.3 now supported.

0.12.1 (2013-04-16)

  • Fix messed up linking of README in sdist (#39).

0.12 (2013-04-15)

  • Fix broken installation of extension modules (#35).

  • Log CPL errors at their matching Python log levels.

  • Use upper case for encoding names within OGR, lower case in Python.

0.11 (2013-04-14)

  • Cythonize .pyx files (#34).

  • Work with or around OGR’s internal recoding of record data (#35).

  • Fix bug in serialization of int/float PROJ.4 params.

0.10 (2013-03-23)

  • Add function to get the width of str type properties.

  • Handle validation and schema representation of 3D geometry types (#29).

  • Return {‘geometry’: None} in the case of a NULL geometry (#31).

0.9.1 (2013-03-07)

  • Silence the logger in ogrext.so (can be overridden).

  • Allow user specification of record field encoding (like ‘Windows-1252’ for Natural Earth shapefiles) to help when OGR can’t detect it.

0.9 (2013-03-06)

  • Accessing file metadata (crs, schema, bounds) on never inspected closed files returns None without exceptions.

  • Add a dict of supported_drivers and their supported modes.

  • Raise ValueError for unsupported drivers and modes.

  • Remove asserts from ogrext.pyx.

  • Add validate_record method to collections.

  • Add helpful coordinate system functions to fiona.crs.

  • Promote use of fiona.open over fiona.collection.

  • Handle Shapefile’s mix of LineString/Polygon and multis (#18).

  • Allow users to specify width of shapefile text fields (#20).

0.8 (2012-02-21)

  • Replaced .opened attribute with .closed (product of collection() is always opened). Also a __del__() which will close a Collection, but still not to be depended upon.

  • Added writerecords method.

  • Added a record buffer and better counting of records in a collection.

  • Manage one iterator per collection/session.

  • Added a read-only bounds property.

0.7 (2012-01-29)

  • Initial timezone-naive support for date, time, and datetime fields. Don’t use these field types if you can avoid them. RFC 3339 datetimes in a string field are much better.

0.6.2 (2012-01-10)

  • Diagnose and set the driver property of collection in read mode.

  • Fail if collection paths are not to files. Multi-collection workspaces are a (maybe) TODO.

0.6.1 (2012-01-06)

  • Handle the case of undefined crs for disk collections.

0.6 (2012-01-05)

  • Support for collection coordinate reference systems based on Proj4.

  • Redirect OGR warnings and errors to the Fiona log.

  • Assert that pointers returned from the ograpi functions are not NULL before using.

0.5 (2011-12-19)

  • Support for reading and writing collections of any geometry type.

  • Feature and Geometry classes replaced by mappings (dicts).

  • Removal of Workspace class.

0.2 (2011-09-16)

  • Rename WorldMill to Fiona.

0.1.1 (2008-12-04)

  • Support for features with no geometry.

Credits

Fiona is written by:

  • Sean Gillies <sean.gillies@gmail.com>

  • Kevin Wurster <wursterk@gmail.com>

  • René Buffat <buffat@gmail.com>

  • Kelsey Jordahl <kjordahl@enthought.com>

  • Patrick Young <patrick.young@digitalglobe.com>

  • Hannes Gräuler <graeuler@geoplex.de>

  • Johan Van de Wauw <johan.vandewauw@gmail.com>

  • Jacob Wasserman <jwasserman@gmail.com>

  • Joshua Arnott <josh@snorfalorpagus.net>

  • Ryan Grout <rgrout@continuum.io>

  • Michael Weisman <mweisman@gmail.com>

  • Brendan Ward <bcward@consbio.org>

  • Michele Citterio <michele@citterio.net>

  • Miro Hrončok <miro@hroncok.cz>

  • fredj <frederic.junod@camptocamp.com>

  • wilsaj <wilson.andrew.j+github@gmail.com>

  • Brandon Liu <bdon@bdon.org>

  • Hannes Gräuler <hgraeule@uos.de>

  • Ludovic Delauné <ludotux@gmail.com>

  • Martijn Visser <mgvisser@gmail.com>

  • Oliver Tonnhofer <olt@bogosoft.com>

  • Stefano Costa <steko@iosa.it>

  • dimlev <dimlev@gmail.com>

  • Ariel Nunez <ingenieroariel@gmail.com>

Fiona would not be possible without the great work of Frank Warmerdam and other GDAL/OGR developers.

Some portions of this work were supported by a grant (for Pleiades) from the U.S. National Endowment for the Humanities (http://www.neh.gov).

 

FileTypePy VersionUploaded onSize
     Fiona-1.6.3-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl      (md5)                Python Wheelcp272015-12-2214MB
     Fiona-1.6.3-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl      (md5)                Python Wheelcp342015-12-2214MB
     Fiona-1.6.3.tar.gz      (md5)                Source
2015-12-221MB

  • Downloads (All Versions):

  • 568 downloads in the last day

  • 3659 downloads in the last week

  • 10498 downloads in the last month

  • Author:  Sean Gillies

  • Home Page:    http://github.com/Toblerity/Fiona

  • Keywords:  gis vector feature data

  • License:      BSD

  • Categories

    • Development Status :: 5 - Production/Stable

    • Intended Audience :: Developers

    • Intended Audience :: Science/Research

    • License :: OSI Approved :: BSD License

    • Operating System :: OS Independent

    • Programming Language :: Python :: 2

    • Programming Language :: Python :: 3

    • Topic :: Scientific/Engineering :: GIS

  • Requires Distributions

    • six

    • click-plugins

    • cligj

  • Package Index Owner:  seang

  • DOAP record:  Fiona-1.6.3.xml


转载于:https://my.oschina.net/u/2306127/blog/601234

这篇关于Fiona地理数据引擎Python库的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/347626

相关文章

Java中注解与元数据示例详解

《Java中注解与元数据示例详解》Java注解和元数据是编程中重要的概念,用于描述程序元素的属性和用途,:本文主要介绍Java中注解与元数据的相关资料,文中通过代码介绍的非常详细,需要的朋友可以参... 目录一、引言二、元数据的概念2.1 定义2.2 作用三、Java 注解的基础3.1 注解的定义3.2 内

将sqlserver数据迁移到mysql的详细步骤记录

《将sqlserver数据迁移到mysql的详细步骤记录》:本文主要介绍将SQLServer数据迁移到MySQL的步骤,包括导出数据、转换数据格式和导入数据,通过示例和工具说明,帮助大家顺利完成... 目录前言一、导出SQL Server 数据二、转换数据格式为mysql兼容格式三、导入数据到MySQL数据

C++中使用vector存储并遍历数据的基本步骤

《C++中使用vector存储并遍历数据的基本步骤》C++标准模板库(STL)提供了多种容器类型,包括顺序容器、关联容器、无序关联容器和容器适配器,每种容器都有其特定的用途和特性,:本文主要介绍C... 目录(1)容器及简要描述‌php顺序容器‌‌关联容器‌‌无序关联容器‌(基于哈希表):‌容器适配器‌:(

Python判断for循环最后一次的6种方法

《Python判断for循环最后一次的6种方法》在Python中,通常我们不会直接判断for循环是否正在执行最后一次迭代,因为Python的for循环是基于可迭代对象的,它不知道也不关心迭代的内部状态... 目录1.使用enuhttp://www.chinasem.cnmerate()和len()来判断for

C#提取PDF表单数据的实现流程

《C#提取PDF表单数据的实现流程》PDF表单是一种常见的数据收集工具,广泛应用于调查问卷、业务合同等场景,凭借出色的跨平台兼容性和标准化特点,PDF表单在各行各业中得到了广泛应用,本文将探讨如何使用... 目录引言使用工具C# 提取多个PDF表单域的数据C# 提取特定PDF表单域的数据引言PDF表单是一

使用Python实现高效的端口扫描器

《使用Python实现高效的端口扫描器》在网络安全领域,端口扫描是一项基本而重要的技能,通过端口扫描,可以发现目标主机上开放的服务和端口,这对于安全评估、渗透测试等有着不可忽视的作用,本文将介绍如何使... 目录1. 端口扫描的基本原理2. 使用python实现端口扫描2.1 安装必要的库2.2 编写端口扫

使用Python实现操作mongodb详解

《使用Python实现操作mongodb详解》这篇文章主要为大家详细介绍了使用Python实现操作mongodb的相关知识,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录一、示例二、常用指令三、遇到的问题一、示例from pymongo import MongoClientf

使用Python合并 Excel单元格指定行列或单元格范围

《使用Python合并Excel单元格指定行列或单元格范围》合并Excel单元格是Excel数据处理和表格设计中的一项常用操作,本文将介绍如何通过Python合并Excel中的指定行列或单... 目录python Excel库安装Python合并Excel 中的指定行Python合并Excel 中的指定列P

一文详解Python中数据清洗与处理的常用方法

《一文详解Python中数据清洗与处理的常用方法》在数据处理与分析过程中,缺失值、重复值、异常值等问题是常见的挑战,本文总结了多种数据清洗与处理方法,文中的示例代码简洁易懂,有需要的小伙伴可以参考下... 目录缺失值处理重复值处理异常值处理数据类型转换文本清洗数据分组统计数据分箱数据标准化在数据处理与分析过

大数据小内存排序问题如何巧妙解决

《大数据小内存排序问题如何巧妙解决》文章介绍了大数据小内存排序的三种方法:数据库排序、分治法和位图法,数据库排序简单但速度慢,对设备要求高;分治法高效但实现复杂;位图法可读性差,但存储空间受限... 目录三种方法:方法概要数据库排序(http://www.chinasem.cn对数据库设备要求较高)分治法(常