This repository has been archived by the owner on May 6, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 87
/
Copy pathcategorical.html
1523 lines (1339 loc) · 164 KB
/
categorical.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<span id="categorical"></span><h1><span class="yiyi-st" id="yiyi-77">Categorical Data</span></h1>
<blockquote>
<p>原文:<a href="http://pandas.pydata.org/pandas-docs/stable/categorical.html">http://pandas.pydata.org/pandas-docs/stable/categorical.html</a></p>
<p>译者:<a href="https://github.com/wizardforcel">飞龙</a> <a href="http://usyiyi.cn/">UsyiyiCN</a></p>
<p>校对:(虚位以待)</p>
</blockquote>
<div class="versionadded">
<p><span class="yiyi-st" id="yiyi-78"><span class="versionmodified">版本0.15中的新功能。</span></span></p>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-79">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-80">虽然在早期版本中有<cite>pandas.Categorical</cite>,在<cite>系列</cite>和<cite>DataFrame</cite>中使用分类数据的功能是新功能。</span></p>
</div>
<p><span class="yiyi-st" id="yiyi-81">这是对pandas分类数据类型的简介,包括与R的<code class="docutils literal"><span class="pre">factor</span></code>的短暂比较。</span></p>
<p><span class="yiyi-st" id="yiyi-82"><cite>分类</cite>是与统计中的分类变量相对应的熊猫数据类型:一个变量,只能包含有限的,通常固定的可能值(<cite>类别</cite> ; R中的<cite>电平</cite>)。</span><span class="yiyi-st" id="yiyi-83">例如性别,社会阶层,血型,国家关系,观察时间或通过Likert量表评分。</span></p>
<p><span class="yiyi-st" id="yiyi-84">与统计分类变量相反,分类数据可能有顺序(例如“强同意”与“同意”或“第一次观察”与“第二次观察”),但数值运算(加法,除法,...)可能。</span></p>
<p><span class="yiyi-st" id="yiyi-85">分类数据的所有值均位于<cite>类别</cite>或<cite>np.nan</cite>中。</span><span class="yiyi-st" id="yiyi-86">顺序由<cite>类别</cite>的顺序定义,而不是值的词汇顺序。</span><span class="yiyi-st" id="yiyi-87">在内部,数据结构由<cite>类别</cite>数组和<cite>代码</cite>的整数数组组成,它指向<cite>类别</cite>数组中的实际值。</span></p>
<p><span class="yiyi-st" id="yiyi-88">分类数据类型在以下情况下很有用:</span></p>
<ul class="simple">
<li><span class="yiyi-st" id="yiyi-89">一个字符串变量,只包含几个不同的值。</span><span class="yiyi-st" id="yiyi-90">将此类字符串变量转换为分类变量将会节省一些内存,请参见<a class="reference internal" href="#categorical-memory"><span class="std std-ref">here</span></a>。</span></li>
<li><span class="yiyi-st" id="yiyi-91">变量的词法顺序与逻辑顺序(“一个”,“两个”,“三个”)不同。</span><span class="yiyi-st" id="yiyi-92">通过转换为分类并在类别上指定顺序,排序和最小/最大将使用逻辑顺序而不是词法顺序,请参见<a class="reference internal" href="#categorical-sort"><span class="std std-ref">here</span></a>。</span></li>
<li><span class="yiyi-st" id="yiyi-93">作为一个信号给其他python库,这个列应该被当作一个分类变量(例如使用合适的统计方法或图类型)。</span></li>
</ul>
<p><span class="yiyi-st" id="yiyi-94">另请参阅<a class="reference internal" href="api.html#api-categorical"><span class="std std-ref">API docs on categoricals</span></a>。</span></p>
<div class="section" id="object-creation">
<h2><span class="yiyi-st" id="yiyi-95">Object Creation</span></h2>
<p><span class="yiyi-st" id="yiyi-96">分类<cite>系列</cite>或<cite>DataFrame</cite>中的列可以通过以下几种方式创建:</span></p>
<p><span class="yiyi-st" id="yiyi-97">在构建<cite>系列</cite>时指定<code class="docutils literal"><span class="pre">dtype="category"</span></code>:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">)</span>
<span class="gp">In [2]: </span><span class="n">s</span>
<span class="gr">Out[2]: </span>
<span class="go">0 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">3 a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [a, b, c]</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-98">将现有的<cite>系列</cite>或列转换为<code class="docutils literal"><span class="pre">category</span></code> dtype:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [3]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"A"</span><span class="p">:[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">]})</span>
<span class="gp">In [4]: </span><span class="n">df</span><span class="p">[</span><span class="s2">"B"</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s2">"A"</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'category'</span><span class="p">)</span>
<span class="gp">In [5]: </span><span class="n">df</span>
<span class="gr">Out[5]: </span>
<span class="go"> A B</span>
<span class="go">0 a a</span>
<span class="go">1 b b</span>
<span class="go">2 c c</span>
<span class="go">3 a a</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-99">通过使用一些特殊功能:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [6]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s1">'value'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">20</span><span class="p">)})</span>
<span class="gp">In [7]: </span><span class="n">labels</span> <span class="o">=</span> <span class="p">[</span> <span class="s2">"{0} - {1}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">9</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span> <span class="p">]</span>
<span class="gp">In [8]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'group'</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">cut</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">value</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">105</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="n">right</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">labels</span><span class="o">=</span><span class="n">labels</span><span class="p">)</span>
<span class="gp">In [9]: </span><span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="gr">Out[9]: </span>
<span class="go"> value group</span>
<span class="go">0 65 60 - 69</span>
<span class="go">1 49 40 - 49</span>
<span class="go">2 56 50 - 59</span>
<span class="go">3 43 40 - 49</span>
<span class="go">4 43 40 - 49</span>
<span class="go">5 91 90 - 99</span>
<span class="go">6 32 30 - 39</span>
<span class="go">7 87 80 - 89</span>
<span class="go">8 36 30 - 39</span>
<span class="go">9 8 0 - 9</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-100">有关<a class="reference internal" href="generated/pandas.cut.html#pandas.cut" title="pandas.cut"><code class="xref py py-func docutils literal"><span class="pre">cut()</span></code></a>的信息,请参阅<a class="reference internal" href="reshaping.html#reshaping-tile-cut"><span class="std std-ref">documentation</span></a>。</span></p>
<p><span class="yiyi-st" id="yiyi-101">将<a class="reference internal" href="generated/pandas.Categorical.html#pandas.Categorical" title="pandas.Categorical"><code class="xref py py-class docutils literal"><span class="pre">pandas.Categorical</span></code></a>对象传递到<cite>系列</cite>或将其分配给<cite>DataFrame</cite>。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [10]: </span><span class="n">raw_cat</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">],</span>
<span class="gp"> ....:</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="gp"> ....:</span>
<span class="gp">In [11]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">raw_cat</span><span class="p">)</span>
<span class="gp">In [12]: </span><span class="n">s</span>
<span class="gr">Out[12]: </span>
<span class="go">0 NaN</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">3 NaN</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [b, c, d]</span>
<span class="gp">In [13]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"A"</span><span class="p">:[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">]})</span>
<span class="gp">In [14]: </span><span class="n">df</span><span class="p">[</span><span class="s2">"B"</span><span class="p">]</span> <span class="o">=</span> <span class="n">raw_cat</span>
<span class="gp">In [15]: </span><span class="n">df</span>
<span class="gr">Out[15]: </span>
<span class="go"> A B</span>
<span class="go">0 a NaN</span>
<span class="go">1 b b</span>
<span class="go">2 c c</span>
<span class="go">3 a NaN</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-102">您还可以通过将这些参数传递到<code class="docutils literal"><span class="pre">astype()</span></code>来指定不同排序的类别或将结果数据排序:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [16]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">])</span>
<span class="gp">In [17]: </span><span class="n">s_cat</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s2">"category"</span><span class="p">,</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">],</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="gp">In [18]: </span><span class="n">s_cat</span>
<span class="gr">Out[18]: </span>
<span class="go">0 NaN</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">3 NaN</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [b, c, d]</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-103">分类数据具有特定的<code class="docutils literal"><span class="pre">category</span></code> <a class="reference internal" href="basics.html#basics-dtypes"><span class="std std-ref">dtype</span></a>:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [19]: </span><span class="n">df</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[19]: </span>
<span class="go">A object</span>
<span class="go">B category</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-104">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-105">与R的<cite>因子</cite>函数相反,分类数据不会将输入值转换为字符串,类别将最终与原始值具有相同的数据类型。</span></p>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-106">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-107">与R的<cite>因子</cite>函数相反,当前没有办法在创建时分配/更改标签。</span><span class="yiyi-st" id="yiyi-108">使用<cite>类别</cite>可在创建时间后更改类别。</span></p>
</div>
<p><span class="yiyi-st" id="yiyi-109">要返回原始系列或<cite>numpy</cite>数组,请使用<code class="docutils literal"><span class="pre">Series.astype(original_dtype)</span></code>或<code class="docutils literal"><span class="pre">np.asarray(categorical)</span></code>:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [20]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">])</span>
<span class="gp">In [21]: </span><span class="n">s</span>
<span class="gr">Out[21]: </span>
<span class="go">0 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">3 a</span>
<span class="go">dtype: object</span>
<span class="gp">In [22]: </span><span class="n">s2</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'category'</span><span class="p">)</span>
<span class="gp">In [23]: </span><span class="n">s2</span>
<span class="gr">Out[23]: </span>
<span class="go">0 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">3 a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [a, b, c]</span>
<span class="gp">In [24]: </span><span class="n">s3</span> <span class="o">=</span> <span class="n">s2</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'string'</span><span class="p">)</span>
<span class="gp">In [25]: </span><span class="n">s3</span>
<span class="gr">Out[25]: </span>
<span class="go">0 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">3 a</span>
<span class="go">dtype: object</span>
<span class="gp">In [26]: </span><span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">s2</span><span class="p">)</span>
<span class="gr">Out[26]: </span><span class="n">array</span><span class="p">([</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">,</span> <span class="s1">'a'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">)</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-110">如果您已经具有<cite>代码</cite>和<cite>类别</cite>,则可以使用<a class="reference internal" href="generated/pandas.Categorical.from_codes.html#pandas.Categorical.from_codes" title="pandas.Categorical.from_codes"><code class="xref py py-func docutils literal"><span class="pre">from_codes()</span></code></a>构造函数在正常构造函数模式下保存factorize步骤:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [27]: </span><span class="n">splitter</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="mi">5</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="p">[</span><span class="mf">0.5</span><span class="p">,</span><span class="mf">0.5</span><span class="p">])</span>
<span class="gp">In [28]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="o">.</span><span class="n">from_codes</span><span class="p">(</span><span class="n">splitter</span><span class="p">,</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"train"</span><span class="p">,</span> <span class="s2">"test"</span><span class="p">]))</span>
</pre></div>
</div>
</div>
<div class="section" id="description">
<h2><span class="yiyi-st" id="yiyi-111">Description</span></h2>
<p><span class="yiyi-st" id="yiyi-112">对分类数据使用<code class="docutils literal"><span class="pre">.describe()</span></code>将产生与<code class="docutils literal"><span class="pre">string</span></code>类型的<cite>Series</cite>或<cite>DataFrame</cite>类似的输出。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [29]: </span><span class="n">cat</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span> <span class="s2">"c"</span><span class="p">,</span> <span class="s2">"c"</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"b"</span><span class="p">,</span> <span class="s2">"a"</span><span class="p">,</span> <span class="s2">"c"</span><span class="p">])</span>
<span class="gp">In [30]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"cat"</span><span class="p">:</span><span class="n">cat</span><span class="p">,</span> <span class="s2">"s"</span><span class="p">:[</span><span class="s2">"a"</span><span class="p">,</span> <span class="s2">"c"</span><span class="p">,</span> <span class="s2">"c"</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span><span class="p">]})</span>
<span class="gp">In [31]: </span><span class="n">df</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span>
<span class="gr">Out[31]: </span>
<span class="go"> cat s</span>
<span class="go">count 3 3</span>
<span class="go">unique 2 2</span>
<span class="go">top c c</span>
<span class="go">freq 2 2</span>
<span class="gp">In [32]: </span><span class="n">df</span><span class="p">[</span><span class="s2">"cat"</span><span class="p">]</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span>
<span class="gr">Out[32]: </span>
<span class="go">count 3</span>
<span class="go">unique 2</span>
<span class="go">top c</span>
<span class="go">freq 2</span>
<span class="go">Name: cat, dtype: object</span>
</pre></div>
</div>
</div>
<div class="section" id="working-with-categories">
<h2><span class="yiyi-st" id="yiyi-113">Working with categories</span></h2>
<p><span class="yiyi-st" id="yiyi-114">分类数据具有<cite>类别</cite>和<cite>有序</cite>属性,其中列出了其可能的值以及顺序是否重要。</span><span class="yiyi-st" id="yiyi-115">这些属性显示为<code class="docutils literal"><span class="pre">s.cat.categories</span></code>和<code class="docutils literal"><span class="pre">s.cat.ordered</span></code>。</span><span class="yiyi-st" id="yiyi-116">如果不手动指定类别和顺序,则从传递的值推断它们。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [33]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">)</span>
<span class="gp">In [34]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span>
<span class="gr">Out[34]: </span><span class="n">Index</span><span class="p">([</span><span class="s1">u'a'</span><span class="p">,</span> <span class="s1">u'b'</span><span class="p">,</span> <span class="s1">u'c'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'object'</span><span class="p">)</span>
<span class="gp">In [35]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">ordered</span>
<span class="gr">Out[35]: </span><span class="bp">False</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-117">还可以按特定顺序传入类别:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [36]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">]))</span>
<span class="gp">In [37]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span>
<span class="gr">Out[37]: </span><span class="n">Index</span><span class="p">([</span><span class="s1">u'c'</span><span class="p">,</span> <span class="s1">u'b'</span><span class="p">,</span> <span class="s1">u'a'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'object'</span><span class="p">)</span>
<span class="gp">In [38]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">ordered</span>
<span class="gr">Out[38]: </span><span class="bp">False</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-118">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-119">新的分类数据不会自动排序。</span><span class="yiyi-st" id="yiyi-120">您必须明确传递<code class="docutils literal"><span class="pre">ordered=True</span></code>以指示有序的<code class="docutils literal"><span class="pre">Categorical</span></code>。</span></p>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-121">注意</span></p>
<p><span class="yiyi-st" id="yiyi-122"><code class="docutils literal"><span class="pre">Series.unique()</span></code>的结果并不总是与<code class="docutils literal"><span class="pre">Series.cat.categories</span></code>相同,因为<code class="docutils literal"><span class="pre">Series.unique()</span></code>的保证,即它按出现顺序返回类别,并且它仅包括实际存在的值。</span></p>
<div class="last highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [39]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="s1">'babc'</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'category'</span><span class="p">,</span> <span class="n">categories</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="s1">'abcd'</span><span class="p">))</span>
<span class="gp">In [40]: </span><span class="n">s</span>
<span class="gr">Out[40]: </span>
<span class="go">0 b</span>
<span class="go">1 a</span>
<span class="go">2 b</span>
<span class="go">3 c</span>
<span class="go">dtype: category</span>
<span class="go">Categories (4, object): [a, b, c, d]</span>
<span class="c"># categories</span>
<span class="gp">In [41]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span>
<span class="gr">Out[41]: </span><span class="n">Index</span><span class="p">([</span><span class="s1">u'a'</span><span class="p">,</span> <span class="s1">u'b'</span><span class="p">,</span> <span class="s1">u'c'</span><span class="p">,</span> <span class="s1">u'd'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'object'</span><span class="p">)</span>
<span class="c"># uniques</span>
<span class="gp">In [42]: </span><span class="n">s</span><span class="o">.</span><span class="n">unique</span><span class="p">()</span>
<span class="gr">Out[42]: </span>
<span class="go">[b, a, c]</span>
<span class="go">Categories (3, object): [b, a, c]</span>
</pre></div>
</div>
</div>
<div class="section" id="renaming-categories">
<h3><span class="yiyi-st" id="yiyi-123">Renaming categories</span></h3>
<p><span class="yiyi-st" id="yiyi-124">通过向<code class="docutils literal"><span class="pre">Series.cat.categories</span></code>属性分配新值或使用<code class="xref py py-func docutils literal"><span class="pre">Categorical.rename_categories()</span></code>方法重命名类别:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [43]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">)</span>
<span class="gp">In [44]: </span><span class="n">s</span>
<span class="gr">Out[44]: </span>
<span class="go">0 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">3 a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [a, b, c]</span>
<span class="gp">In [45]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"Group </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">g</span> <span class="k">for</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span><span class="p">]</span>
<span class="gp">In [46]: </span><span class="n">s</span>
<span class="gr">Out[46]: </span>
<span class="go">0 Group a</span>
<span class="go">1 Group b</span>
<span class="go">2 Group c</span>
<span class="go">3 Group a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [Group a, Group b, Group c]</span>
<span class="gp">In [47]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">rename_categories</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">])</span>
<span class="gr">Out[47]: </span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">3 1</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, int64): [1, 2, 3]</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-125">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-126">与R的<cite>因子</cite>相反,分类数据可以具有除字符串之外的其他类型的类别。</span></p>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-127">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-128">请注意,分配新类别是一个内部操作,而大多数<code class="docutils literal"><span class="pre">Series.cat</span></code>下的其他操作默认返回一系列新的类型<cite>类别</cite>。</span></p>
</div>
<p><span class="yiyi-st" id="yiyi-129">类别必须是唯一的或产生<cite>ValueError</cite>:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [48]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp"> ....:</span> <span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span>
<span class="gp"> ....:</span> <span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp"> ....:</span> <span class="k">print</span><span class="p">(</span><span class="s2">"ValueError: "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">))</span>
<span class="gp"> ....:</span>
<span class="go">ValueError: Categorical categories must be unique</span>
</pre></div>
</div>
</div>
<div class="section" id="appending-new-categories">
<h3><span class="yiyi-st" id="yiyi-130">Appending new categories</span></h3>
<p><span class="yiyi-st" id="yiyi-131">可以使用<code class="xref py py-func docutils literal"><span class="pre">Categorical.add_categories()</span></code>方法来追加类别:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [49]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">add_categories</span><span class="p">([</span><span class="mi">4</span><span class="p">])</span>
<span class="gp">In [50]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span>
<span class="gr">Out[50]: </span><span class="n">Index</span><span class="p">([</span><span class="s1">u'Group a'</span><span class="p">,</span> <span class="s1">u'Group b'</span><span class="p">,</span> <span class="s1">u'Group c'</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'object'</span><span class="p">)</span>
<span class="gp">In [51]: </span><span class="n">s</span>
<span class="gr">Out[51]: </span>
<span class="go">0 Group a</span>
<span class="go">1 Group b</span>
<span class="go">2 Group c</span>
<span class="go">3 Group a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (4, object): [Group a, Group b, Group c, 4]</span>
</pre></div>
</div>
</div>
<div class="section" id="removing-categories">
<h3><span class="yiyi-st" id="yiyi-132">Removing categories</span></h3>
<p><span class="yiyi-st" id="yiyi-133">可以使用<code class="xref py py-func docutils literal"><span class="pre">Categorical.remove_categories()</span></code>方法来删除类别。</span><span class="yiyi-st" id="yiyi-134">删除的值将替换为<code class="docutils literal"><span class="pre">np.nan</span></code>。</span><span class="yiyi-st" id="yiyi-135">:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [52]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">remove_categories</span><span class="p">([</span><span class="mi">4</span><span class="p">])</span>
<span class="gp">In [53]: </span><span class="n">s</span>
<span class="gr">Out[53]: </span>
<span class="go">0 Group a</span>
<span class="go">1 Group b</span>
<span class="go">2 Group c</span>
<span class="go">3 Group a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [Group a, Group b, Group c]</span>
</pre></div>
</div>
</div>
<div class="section" id="removing-unused-categories">
<h3><span class="yiyi-st" id="yiyi-136">Removing unused categories</span></h3>
<p><span class="yiyi-st" id="yiyi-137">删除未使用的类别也可以:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [54]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">]))</span>
<span class="gp">In [55]: </span><span class="n">s</span>
<span class="gr">Out[55]: </span>
<span class="go">0 a</span>
<span class="go">1 b</span>
<span class="go">2 a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (4, object): [a, b, c, d]</span>
<span class="gp">In [56]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">remove_unused_categories</span><span class="p">()</span>
<span class="gr">Out[56]: </span>
<span class="go">0 a</span>
<span class="go">1 b</span>
<span class="go">2 a</span>
<span class="go">dtype: category</span>
<span class="go">Categories (2, object): [a, b]</span>
</pre></div>
</div>
</div>
<div class="section" id="setting-categories">
<h3><span class="yiyi-st" id="yiyi-138">Setting categories</span></h3>
<p><span class="yiyi-st" id="yiyi-139">如果您希望在一个步骤中删除并添加新类别(具有一些速度优势),或只是将类别设置为预定义的比例,请使用<code class="xref py py-func docutils literal"><span class="pre">Categorical.set_categories()</span></code>。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [57]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"one"</span><span class="p">,</span><span class="s2">"two"</span><span class="p">,</span><span class="s2">"four"</span><span class="p">,</span> <span class="s2">"-"</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">)</span>
<span class="gp">In [58]: </span><span class="n">s</span>
<span class="gr">Out[58]: </span>
<span class="go">0 one</span>
<span class="go">1 two</span>
<span class="go">2 four</span>
<span class="go">3 -</span>
<span class="go">dtype: category</span>
<span class="go">Categories (4, object): [-, four, one, two]</span>
<span class="gp">In [59]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">set_categories</span><span class="p">([</span><span class="s2">"one"</span><span class="p">,</span><span class="s2">"two"</span><span class="p">,</span><span class="s2">"three"</span><span class="p">,</span><span class="s2">"four"</span><span class="p">])</span>
<span class="gp">In [60]: </span><span class="n">s</span>
<span class="gr">Out[60]: </span>
<span class="go">0 one</span>
<span class="go">1 two</span>
<span class="go">2 four</span>
<span class="go">3 NaN</span>
<span class="go">dtype: category</span>
<span class="go">Categories (4, object): [one, two, three, four]</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-140">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-141">请注意,<code class="xref py py-func docutils literal"><span class="pre">Categorical.set_categories()</span></code>无法知道某个类别是否被有意省略,或者因为类型差异(例如numpys S1 dtype和python字符串)而拼写错误或(在Python3下)。</span><span class="yiyi-st" id="yiyi-142"></span></p>
</div>
</div>
</div>
<div class="section" id="sorting-and-order">
<h2><span class="yiyi-st" id="yiyi-143">Sorting and Order</span></h2>
<div class="admonition warning" id="categorical-sort">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-144">警告</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-145">默认的构造在v0.16.0中从先前隐式的<code class="docutils literal"><span class="pre">ordered=True</span></code>改变为<code class="docutils literal"><span class="pre">ordered=False</span></code></span></p>
</div>
<p><span class="yiyi-st" id="yiyi-146">如果对类别数据排序(<code class="docutils literal"><span class="pre">s.cat.ordered</span> <span class="pre">==</span> <span class="pre">True</span></code>),具有含义并且某些操作是可能的。</span><span class="yiyi-st" id="yiyi-147">如果分类是无序的,<code class="docutils literal"><span class="pre">.min()/.max()</span></code>会引发一个<cite>TypeError</cite>。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [61]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">False</span><span class="p">))</span>
<span class="gp">In [62]: </span><span class="n">s</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [63]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">])</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'category'</span><span class="p">,</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [64]: </span><span class="n">s</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [65]: </span><span class="n">s</span>
<span class="gr">Out[65]: </span>
<span class="go">0 a</span>
<span class="go">3 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [a < b < c]</span>
<span class="gp">In [66]: </span><span class="n">s</span><span class="o">.</span><span class="n">min</span><span class="p">(),</span> <span class="n">s</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
<span class="gr">Out[66]: </span><span class="p">(</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'c'</span><span class="p">)</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-148">您可以使用<code class="docutils literal"><span class="pre">as_ordered()</span></code>或使用<code class="docutils literal"><span class="pre">as_unordered()</span></code>无序排序设置要排序的分类数据。</span><span class="yiyi-st" id="yiyi-149">这些将默认返回<em>新</em>对象。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [67]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">as_ordered</span><span class="p">()</span>
<span class="gr">Out[67]: </span>
<span class="go">0 a</span>
<span class="go">3 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [a < b < c]</span>
<span class="gp">In [68]: </span><span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">as_unordered</span><span class="p">()</span>
<span class="gr">Out[68]: </span>
<span class="go">0 a</span>
<span class="go">3 a</span>
<span class="go">1 b</span>
<span class="go">2 c</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, object): [a, b, c]</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-150">排序将使用类别定义的顺序,而不是数据类型上存在的任何词法顺序。</span><span class="yiyi-st" id="yiyi-151">这对于字符串和数字数据是正确的:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [69]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">)</span>
<span class="gp">In [70]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">set_categories</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [71]: </span><span class="n">s</span>
<span class="gr">Out[71]: </span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">3 1</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, int64): [2 < 3 < 1]</span>
<span class="gp">In [72]: </span><span class="n">s</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [73]: </span><span class="n">s</span>
<span class="gr">Out[73]: </span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">0 1</span>
<span class="go">3 1</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, int64): [2 < 3 < 1]</span>
<span class="gp">In [74]: </span><span class="n">s</span><span class="o">.</span><span class="n">min</span><span class="p">(),</span> <span class="n">s</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
<span class="gr">Out[74]: </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</pre></div>
</div>
<div class="section" id="reordering">
<h3><span class="yiyi-st" id="yiyi-152">Reordering</span></h3>
<p><span class="yiyi-st" id="yiyi-153">可以通过<code class="xref py py-func docutils literal"><span class="pre">Categorical.reorder_categories()</span></code>和<code class="xref py py-func docutils literal"><span class="pre">Categorical.set_categories()</span></code>方法重新排序类别。</span><span class="yiyi-st" id="yiyi-154">对于<code class="xref py py-func docutils literal"><span class="pre">Categorical.reorder_categories()</span></code>,所有旧类别都必须包含在新类别中,不允许使用新类别。</span><span class="yiyi-st" id="yiyi-155">这将必然使排序顺序与类别顺序相同。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [75]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">)</span>
<span class="gp">In [76]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">reorder_categories</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [77]: </span><span class="n">s</span>
<span class="gr">Out[77]: </span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">3 1</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, int64): [2 < 3 < 1]</span>
<span class="gp">In [78]: </span><span class="n">s</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [79]: </span><span class="n">s</span>
<span class="gr">Out[79]: </span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">0 1</span>
<span class="go">3 1</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, int64): [2 < 3 < 1]</span>
<span class="gp">In [80]: </span><span class="n">s</span><span class="o">.</span><span class="n">min</span><span class="p">(),</span> <span class="n">s</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
<span class="gr">Out[80]: </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-156">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-157">请注意分配新类别和重新排序类别之间的差异:首先重命名类别,因此在<cite>系列</cite>中的各个值,但如果第一个位置最后排序,重命名的值仍将最后排序。</span><span class="yiyi-st" id="yiyi-158">重新排序意味着值的排序方式不同,但不是<cite>系列</cite>中的单个值发生更改。</span></p>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-159">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-160">如果<cite>分类</cite>未排序,<code class="docutils literal"><span class="pre">Series.min()</span></code>和<code class="docutils literal"><span class="pre">Series.max()</span></code>会引发<code class="docutils literal"><span class="pre">TypeError</span></code>。</span><span class="yiyi-st" id="yiyi-161">像<code class="docutils literal"><span class="pre">+</span></code>,<code class="docutils literal"><span class="pre">-</span></code>,<code class="docutils literal"><span class="pre">*</span></code>,<code class="docutils literal"><span class="pre">/</span></code>和基于它们的操作(例如<code class="docutils literal"><span class="pre">Series.median()</span></code>,如果数组的长度是偶数,则需要计算两个值之间的平均值)不起作用,并产生<code class="docutils literal"><span class="pre">TypeError</span></code>。</span></p>
</div>
</div>
<div class="section" id="multi-column-sorting">
<h3><span class="yiyi-st" id="yiyi-162">Multi Column Sorting</span></h3>
<p><span class="yiyi-st" id="yiyi-163">分类类型列将以与其他列类似的方式参与多列排序。</span><span class="yiyi-st" id="yiyi-164">分类的排序由该列的<code class="docutils literal"><span class="pre">categories</span></code>确定。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [81]: </span><span class="n">dfs</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s1">'A'</span> <span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="s1">'bbeebbaa'</span><span class="p">),</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s1">'e'</span><span class="p">,</span><span class="s1">'a'</span><span class="p">,</span><span class="s1">'b'</span><span class="p">],</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
<span class="gp"> ....:</span> <span class="s1">'B'</span> <span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span> <span class="p">})</span>
<span class="gp"> ....:</span>
<span class="gp">In [82]: </span><span class="n">dfs</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="p">[</span><span class="s1">'A'</span><span class="p">,</span> <span class="s1">'B'</span><span class="p">])</span>
<span class="gr">Out[82]: </span>
<span class="go"> A B</span>
<span class="go">2 e 1</span>
<span class="go">3 e 2</span>
<span class="go">7 a 1</span>
<span class="go">6 a 2</span>
<span class="go">0 b 1</span>
<span class="go">5 b 1</span>
<span class="go">1 b 2</span>
<span class="go">4 b 2</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-165">重新排序<code class="docutils literal"><span class="pre">categories</span></code>会更改未来排序。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [83]: </span><span class="n">dfs</span><span class="p">[</span><span class="s1">'A'</span><span class="p">]</span> <span class="o">=</span> <span class="n">dfs</span><span class="p">[</span><span class="s1">'A'</span><span class="p">]</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">reorder_categories</span><span class="p">([</span><span class="s1">'a'</span><span class="p">,</span><span class="s1">'b'</span><span class="p">,</span><span class="s1">'e'</span><span class="p">])</span>
<span class="gp">In [84]: </span><span class="n">dfs</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="p">[</span><span class="s1">'A'</span><span class="p">,</span><span class="s1">'B'</span><span class="p">])</span>
<span class="gr">Out[84]: </span>
<span class="go"> A B</span>
<span class="go">7 a 1</span>
<span class="go">6 a 2</span>
<span class="go">0 b 1</span>
<span class="go">5 b 1</span>
<span class="go">1 b 2</span>
<span class="go">4 b 2</span>
<span class="go">2 e 1</span>
<span class="go">3 e 2</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="comparisons">
<h2><span class="yiyi-st" id="yiyi-166">Comparisons</span></h2>
<p><span class="yiyi-st" id="yiyi-167">在三种情况下可以比较分类数据与其他对象:</span></p>
<blockquote>
<div><ul class="simple">
<li><span class="yiyi-st" id="yiyi-168">将等式(<code class="docutils literal"><span class="pre">==</span></code>和<code class="docutils literal"><span class="pre">!=</span></code>)与类别对象(列表,系列,数组,...)进行比较,其长度与分类数据相同。</span></li>
<li><span class="yiyi-st" id="yiyi-169">all comparisons (<code class="docutils literal"><span class="pre">==</span></code>, <code class="docutils literal"><span class="pre">!=</span></code>, <code class="docutils literal"><span class="pre">></span></code>, <code class="docutils literal"><span class="pre">>=</span></code>, <code class="docutils literal"><span class="pre"><</span></code>, and <code class="docutils literal"><span class="pre"><=</span></code>) of categorical data to another categorical Series, when <code class="docutils literal"><span class="pre">ordered==True</span></code> and the <cite>categories</cite> are the same.</span></li>
<li><span class="yiyi-st" id="yiyi-170">分类数据与标量的所有比较。</span></li>
</ul>
</div></blockquote>
<p><span class="yiyi-st" id="yiyi-171">所有其他比较,特别是具有不同类别的两个分类的“非等同”比较,或者具有任何类似列表的对象的分类,将引起TypeError。</span></p>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-172">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-173">对分类数据与<cite>系列</cite>,<cite>np.array</cite>,<cite>列表</cite>或具有不同类别或排序的分类数据的任何“不等同”比较<cite>TypeError</cite>,因为自定义类别排序可以用两种方式解释:一种考虑到排序,一种没有。</span></p>
</div>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [85]: </span><span class="n">cat</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">])</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s2">"category"</span><span class="p">,</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [86]: </span><span class="n">cat_base</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">])</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s2">"category"</span><span class="p">,</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [87]: </span><span class="n">cat_base2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">])</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s2">"category"</span><span class="p">,</span> <span class="n">ordered</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="gp">In [88]: </span><span class="n">cat</span>
<span class="gr">Out[88]: </span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, int64): [3 < 2 < 1]</span>
<span class="gp">In [89]: </span><span class="n">cat_base</span>
<span class="gr">Out[89]: </span>
<span class="go">0 2</span>
<span class="go">1 2</span>
<span class="go">2 2</span>
<span class="go">dtype: category</span>
<span class="go">Categories (3, int64): [3 < 2 < 1]</span>
<span class="gp">In [90]: </span><span class="n">cat_base2</span>
<span class="gr">Out[90]: </span>
<span class="go">0 2</span>
<span class="go">1 2</span>
<span class="go">2 2</span>
<span class="go">dtype: category</span>
<span class="go">Categories (1, int64): [2]</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-174">比较具有相同类别和排序或标量作品的分类:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [91]: </span><span class="n">cat</span> <span class="o">></span> <span class="n">cat_base</span>
<span class="gr">Out[91]: </span>
<span class="go">0 True</span>
<span class="go">1 False</span>
<span class="go">2 False</span>
<span class="go">dtype: bool</span>
<span class="gp">In [92]: </span><span class="n">cat</span> <span class="o">></span> <span class="mi">2</span>
<span class="gr">Out[92]: </span>
<span class="go">0 True</span>
<span class="go">1 False</span>
<span class="go">2 False</span>
<span class="go">dtype: bool</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-175">平等比较与任何相同长度和标量的类似列表对象一起使用:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [93]: </span><span class="n">cat</span> <span class="o">==</span> <span class="n">cat_base</span>
<span class="gr">Out[93]: </span>
<span class="go">0 False</span>
<span class="go">1 True</span>
<span class="go">2 False</span>
<span class="go">dtype: bool</span>
<span class="gp">In [94]: </span><span class="n">cat</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">])</span>
<span class="gr">Out[94]: </span>
<span class="go">0 True</span>
<span class="go">1 True</span>
<span class="go">2 True</span>
<span class="go">dtype: bool</span>
<span class="gp">In [95]: </span><span class="n">cat</span> <span class="o">==</span> <span class="mi">2</span>
<span class="gr">Out[95]: </span>
<span class="go">0 False</span>
<span class="go">1 True</span>
<span class="go">2 False</span>
<span class="go">dtype: bool</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-176">这不工作,因为类别不一样:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [96]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp"> ....:</span> <span class="n">cat</span> <span class="o">></span> <span class="n">cat_base2</span>
<span class="gp"> ....:</span> <span class="k">except</span> <span class="ne">TypeError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp"> ....:</span> <span class="k">print</span><span class="p">(</span><span class="s2">"TypeError: "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">))</span>
<span class="gp"> ....:</span>
<span class="go">TypeError: Categoricals can only be compared if 'categories' are the same</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-177">如果要对类别序列与不是分类数据的类似列表对象执行“非等同”比较,则需要显式并将分类数据转换回原始值:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [97]: </span><span class="n">base</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">])</span>
<span class="gp">In [98]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp"> ....:</span> <span class="n">cat</span> <span class="o">></span> <span class="n">base</span>
<span class="gp"> ....:</span> <span class="k">except</span> <span class="ne">TypeError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp"> ....:</span> <span class="k">print</span><span class="p">(</span><span class="s2">"TypeError: "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">))</span>
<span class="gp"> ....:</span>
<span class="go">TypeError: Cannot compare a Categorical for op __gt__ with type <type 'numpy.ndarray'>.</span>
<span class="go">If you want to compare values, use 'np.asarray(cat) <op> other'.</span>
<span class="gp">In [99]: </span><span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">cat</span><span class="p">)</span> <span class="o">></span> <span class="n">base</span>
<span class="gr">Out[99]: </span><span class="n">array</span><span class="p">([</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">bool</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="operations">
<h2><span class="yiyi-st" id="yiyi-178">Operations</span></h2>
<p><span class="yiyi-st" id="yiyi-179">除了<code class="docutils literal"><span class="pre">Series.min()</span></code>,<code class="docutils literal"><span class="pre">Series.max()</span></code>和<code class="docutils literal"><span class="pre">Series.mode()</span></code>,以下操作对于分类数据是可能的:</span></p>
<p><span class="yiyi-st" id="yiyi-180"><cite>Series.value_counts()</cite>的<cite>系列</cite>方法将使用所有类别,即使数据中不存在某些类别:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [100]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">]))</span>
<span class="gp">In [101]: </span><span class="n">s</span><span class="o">.</span><span class="n">value_counts</span><span class="p">()</span>
<span class="gr">Out[101]: </span>
<span class="go">c 2</span>
<span class="go">b 1</span>
<span class="go">a 1</span>
<span class="go">d 0</span>
<span class="go">dtype: int64</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-181">Groupby还将显示“未使用”类别:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [102]: </span><span class="n">cats</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">])</span>
<span class="gp">In [103]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"cats"</span><span class="p">:</span><span class="n">cats</span><span class="p">,</span><span class="s2">"values"</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]})</span>
<span class="gp">In [104]: </span><span class="n">df</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="s2">"cats"</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
<span class="gr">Out[104]: </span>
<span class="go"> values</span>
<span class="go">cats </span>
<span class="go">a 1.0</span>
<span class="go">b 2.0</span>
<span class="go">c 4.0</span>
<span class="go">d NaN</span>
<span class="gp">In [105]: </span><span class="n">cats2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">])</span>
<span class="gp">In [106]: </span><span class="n">df2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"cats"</span><span class="p">:</span><span class="n">cats2</span><span class="p">,</span><span class="s2">"B"</span><span class="p">:[</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">],</span> <span class="s2">"values"</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]})</span>
<span class="gp">In [107]: </span><span class="n">df2</span><span class="o">.</span><span class="n">groupby</span><span class="p">([</span><span class="s2">"cats"</span><span class="p">,</span><span class="s2">"B"</span><span class="p">])</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
<span class="gr">Out[107]: </span>
<span class="go"> values</span>
<span class="go">cats B </span>
<span class="go">a c 1.0</span>
<span class="go"> d 2.0</span>
<span class="go">b c 3.0</span>
<span class="go"> d 4.0</span>
<span class="go">c c NaN</span>
<span class="go"> d NaN</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-182">数据透视表:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [108]: </span><span class="n">raw_cat</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">])</span>
<span class="gp">In [109]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"A"</span><span class="p">:</span><span class="n">raw_cat</span><span class="p">,</span><span class="s2">"B"</span><span class="p">:[</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">],</span> <span class="s2">"values"</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]})</span>
<span class="gp">In [110]: </span><span class="n">pd</span><span class="o">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">values</span><span class="o">=</span><span class="s1">'values'</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s1">'A'</span><span class="p">,</span> <span class="s1">'B'</span><span class="p">])</span>
<span class="gr">Out[110]: </span>
<span class="go">A B</span>
<span class="go">a c 1.0</span>
<span class="go"> d 2.0</span>
<span class="go">b c 3.0</span>
<span class="go"> d 4.0</span>
<span class="go">c c NaN</span>
<span class="go"> d NaN</span>
<span class="go">Name: values, dtype: float64</span>
</pre></div>
</div>
</div>
<div class="section" id="data-munging">
<h2><span class="yiyi-st" id="yiyi-183">Data munging</span></h2>
<p><span class="yiyi-st" id="yiyi-184">优化的pandas数据访问方法<code class="docutils literal"><span class="pre">.loc</span></code>,<code class="docutils literal"><span class="pre">.iloc</span></code>,<code class="docutils literal"><span class="pre">.ix</span></code> <code class="docutils literal"><span class="pre">.at</span></code>和<code class="docutils literal"><span class="pre">.iat</span></code>,正常工作。</span><span class="yiyi-st" id="yiyi-185">唯一的区别是返回类型(用于获取),并且只能分配<cite>类别</cite>中的值。</span></p>
<div class="section" id="getting">
<h3><span class="yiyi-st" id="yiyi-186">Getting</span></h3>
<p><span class="yiyi-st" id="yiyi-187">如果切片操作返回<cite>DataFrame</cite>或类型<cite>系列</cite>的列,则会保留<code class="docutils literal"><span class="pre">category</span></code> dtype。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [111]: </span><span class="n">idx</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Index</span><span class="p">([</span><span class="s2">"h"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="s2">"j"</span><span class="p">,</span><span class="s2">"k"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">,])</span>
<span class="gp">In [112]: </span><span class="n">cats</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="n">idx</span><span class="p">)</span>
<span class="gp">In [113]: </span><span class="n">values</span><span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span>
<span class="gp">In [114]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"cats"</span><span class="p">:</span><span class="n">cats</span><span class="p">,</span><span class="s2">"values"</span><span class="p">:</span><span class="n">values</span><span class="p">},</span> <span class="n">index</span><span class="o">=</span><span class="n">idx</span><span class="p">)</span>
<span class="gp">In [115]: </span><span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="mi">4</span><span class="p">,:]</span>
<span class="gr">Out[115]: </span>
<span class="go"> cats values</span>
<span class="go">j b 2</span>
<span class="go">k b 2</span>
<span class="gp">In [116]: </span><span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="mi">4</span><span class="p">,:]</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[116]: </span>
<span class="go">cats category</span>
<span class="go">values int64</span>
<span class="go">dtype: object</span>
<span class="gp">In [117]: </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s2">"h"</span><span class="p">:</span><span class="s2">"j"</span><span class="p">,</span><span class="s2">"cats"</span><span class="p">]</span>
<span class="gr">Out[117]: </span>
<span class="go">h a</span>
<span class="go">i b</span>
<span class="go">j b</span>
<span class="go">Name: cats, dtype: category</span>
<span class="go">Categories (3, object): [a, b, c]</span>
<span class="gp">In [118]: </span><span class="n">df</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="s2">"h"</span><span class="p">:</span><span class="s2">"j"</span><span class="p">,</span><span class="mi">0</span><span class="p">:</span><span class="mi">1</span><span class="p">]</span>
<span class="gr">Out[118]: </span>
<span class="go"> cats</span>
<span class="go">h a</span>
<span class="go">i b</span>
<span class="go">j b</span>
<span class="gp">In [119]: </span><span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s2">"cats"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"b"</span><span class="p">]</span>
<span class="gr">Out[119]: </span>
<span class="go"> cats values</span>
<span class="go">i b 2</span>
<span class="go">j b 2</span>
<span class="go">k b 2</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-188">未保留类别类型的示例是,如果您只需要一行:生成的<cite>Series</cite>是dtype <code class="docutils literal"><span class="pre">object</span></code>:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="c"># get the complete "h" row as a Series</span>
<span class="gp">In [120]: </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s2">"h"</span><span class="p">,</span> <span class="p">:]</span>
<span class="gr">Out[120]: </span>
<span class="go">cats a</span>
<span class="go">values 1</span>
<span class="go">Name: h, dtype: object</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-189">从分类数据返回单个项目也将返回值,而不是长度为“1”的分类。</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [121]: </span><span class="n">df</span><span class="o">.</span><span class="n">iat</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span>
<span class="gr">Out[121]: </span><span class="s1">'a'</span>
<span class="gp">In [122]: </span><span class="n">df</span><span class="p">[</span><span class="s2">"cats"</span><span class="p">]</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"x"</span><span class="p">,</span><span class="s2">"y"</span><span class="p">,</span><span class="s2">"z"</span><span class="p">]</span>
<span class="gp">In [123]: </span><span class="n">df</span><span class="o">.</span><span class="n">at</span><span class="p">[</span><span class="s2">"h"</span><span class="p">,</span><span class="s2">"cats"</span><span class="p">]</span> <span class="c1"># returns a string</span>
<span class="gr">Out[123]: </span><span class="s1">'x'</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-190">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-191">这与R的<cite>因子</cite>函数不同,其中<code class="docutils literal"><span class="pre">factor(c(1,2,3))[1]</span></code>返回单个值<cite>。</cite></span></p>
</div>
<p><span class="yiyi-st" id="yiyi-192">要获取类型<code class="docutils literal"><span class="pre">category</span></code>类型的单个值<cite>Series</cite>,请传入具有单个值的列表:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [124]: </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[[</span><span class="s2">"h"</span><span class="p">],</span><span class="s2">"cats"</span><span class="p">]</span>
<span class="gr">Out[124]: </span>
<span class="go">h x</span>
<span class="go">Name: cats, dtype: category</span>
<span class="go">Categories (3, object): [x, y, z]</span>
</pre></div>
</div>
</div>
<div class="section" id="string-and-datetime-accessors">
<h3><span class="yiyi-st" id="yiyi-193">String and datetime accessors</span></h3>
<div class="versionadded">
<p><span class="yiyi-st" id="yiyi-194"><span class="versionmodified">版本0.17.1中的新功能。</span></span></p>
</div>
<p><span class="yiyi-st" id="yiyi-195">如果<code class="docutils literal"><span class="pre">s.cat.categories</span></code>是适当类型,则访问器<code class="docutils literal"><span class="pre">.dt</span></code>和<code class="docutils literal"><span class="pre">.str</span></code></span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [125]: </span><span class="n">str_s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="s1">'aabb'</span><span class="p">))</span>
<span class="gp">In [126]: </span><span class="n">str_cat</span> <span class="o">=</span> <span class="n">str_s</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'category'</span><span class="p">)</span>
<span class="gp">In [127]: </span><span class="n">str_cat</span>
<span class="gr">Out[127]: </span>
<span class="go">0 a</span>
<span class="go">1 a</span>
<span class="go">2 b</span>
<span class="go">3 b</span>
<span class="go">dtype: category</span>
<span class="go">Categories (2, object): [a, b]</span>
<span class="gp">In [128]: </span><span class="n">str_cat</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="gr">Out[128]: </span>
<span class="go">0 True</span>
<span class="go">1 True</span>
<span class="go">2 False</span>
<span class="go">3 False</span>
<span class="go">dtype: bool</span>
<span class="gp">In [129]: </span><span class="n">date_s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">date_range</span><span class="p">(</span><span class="s1">'1/1/2015'</span><span class="p">,</span> <span class="n">periods</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>
<span class="gp">In [130]: </span><span class="n">date_cat</span> <span class="o">=</span> <span class="n">date_s</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'category'</span><span class="p">)</span>
<span class="gp">In [131]: </span><span class="n">date_cat</span>
<span class="gr">Out[131]: </span>
<span class="go">0 2015-01-01</span>
<span class="go">1 2015-01-02</span>
<span class="go">2 2015-01-03</span>
<span class="go">3 2015-01-04</span>
<span class="go">4 2015-01-05</span>
<span class="go">dtype: category</span>
<span class="go">Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05]</span>
<span class="gp">In [132]: </span><span class="n">date_cat</span><span class="o">.</span><span class="n">dt</span><span class="o">.</span><span class="n">day</span>
<span class="gr">Out[132]: </span>
<span class="go">0 1</span>
<span class="go">1 2</span>
<span class="go">2 3</span>
<span class="go">3 4</span>
<span class="go">4 5</span>
<span class="go">dtype: int64</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-196">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-197">The returned <code class="docutils literal"><span class="pre">Series</span></code> (or <code class="docutils literal"><span class="pre">DataFrame</span></code>) is of the same type as if you used the <code class="docutils literal"><span class="pre">.str.<method></span></code> / <code class="docutils literal"><span class="pre">.dt.<method></span></code> on a <code class="docutils literal"><span class="pre">Series</span></code> of that type (and not of type <code class="docutils literal"><span class="pre">category</span></code>! </span><span class="yiyi-st" id="yiyi-198">)。</span></p>
</div>
<p><span class="yiyi-st" id="yiyi-199">这意味着,从<code class="docutils literal"><span class="pre">Series</span></code>的访问器上的方法和属性返回的值以及此<code class="docutils literal"><span class="pre">Series</span></code>的访问器上的方法和属性的返回值都转换为类型之一<cite>类别</cite>将相等:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [133]: </span><span class="n">ret_s</span> <span class="o">=</span> <span class="n">str_s</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="gp">In [134]: </span><span class="n">ret_cat</span> <span class="o">=</span> <span class="n">str_cat</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="gp">In [135]: </span><span class="n">ret_s</span><span class="o">.</span><span class="n">dtype</span> <span class="o">==</span> <span class="n">ret_cat</span><span class="o">.</span><span class="n">dtype</span>
<span class="gr">Out[135]: </span><span class="bp">True</span>
<span class="gp">In [136]: </span><span class="n">ret_s</span> <span class="o">==</span> <span class="n">ret_cat</span>
<span class="gr">Out[136]: </span>
<span class="go">0 True</span>
<span class="go">1 True</span>
<span class="go">2 True</span>
<span class="go">3 True</span>
<span class="go">dtype: bool</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title"><span class="yiyi-st" id="yiyi-200">注意</span></p>
<p class="last"><span class="yiyi-st" id="yiyi-201">工作在<code class="docutils literal"><span class="pre">categories</span></code>上完成,然后构建新的<code class="docutils literal"><span class="pre">Series</span></code>。</span><span class="yiyi-st" id="yiyi-202">如果你有一个<code class="docutils literal"><span class="pre">Series</span></code>类型字符串,其中很多元素被重复(即,<code class="docutils literal"><span class="pre">Series</span></code>中的唯一元素的数量比<code class="docutils literal"><span class="pre">Series</span></code>的长度)。</span><span class="yiyi-st" id="yiyi-203">在这种情况下,将原始<code class="docutils literal"><span class="pre">Series</span></code>转换为<code class="docutils literal"><span class="pre">category</span></code>之一并使用<code class="docutils literal"><span class="pre">.str.<method></span></code>或<code class="docutils literal"><span class="pre">.dt.<property></span></code>。</span></p>
</div>
</div>
<div class="section" id="setting">
<h3><span class="yiyi-st" id="yiyi-204">Setting</span></h3>
<p><span class="yiyi-st" id="yiyi-205">在分类列(或<cite>系列</cite>)中设置值可以工作,只要该值包含在<cite>类别</cite>中:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [137]: </span><span class="n">idx</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Index</span><span class="p">([</span><span class="s2">"h"</span><span class="p">,</span><span class="s2">"i"</span><span class="p">,</span><span class="s2">"j"</span><span class="p">,</span><span class="s2">"k"</span><span class="p">,</span><span class="s2">"l"</span><span class="p">,</span><span class="s2">"m"</span><span class="p">,</span><span class="s2">"n"</span><span class="p">])</span>
<span class="gp">In [138]: </span><span class="n">cats</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">])</span>
<span class="gp">In [139]: </span><span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span>
<span class="gp">In [140]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"cats"</span><span class="p">:</span><span class="n">cats</span><span class="p">,</span><span class="s2">"values"</span><span class="p">:</span><span class="n">values</span><span class="p">},</span> <span class="n">index</span><span class="o">=</span><span class="n">idx</span><span class="p">)</span>
<span class="gp">In [141]: </span><span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="mi">4</span><span class="p">,:]</span> <span class="o">=</span> <span class="p">[[</span><span class="s2">"b"</span><span class="p">,</span><span class="mi">2</span><span class="p">],[</span><span class="s2">"b"</span><span class="p">,</span><span class="mi">2</span><span class="p">]]</span>
<span class="gp">In [142]: </span><span class="n">df</span>
<span class="gr">Out[142]: </span>
<span class="go"> cats values</span>
<span class="go">h a 1</span>
<span class="go">i a 1</span>
<span class="go">j b 2</span>
<span class="go">k b 2</span>
<span class="go">l a 1</span>
<span class="go">m a 1</span>
<span class="go">n a 1</span>
<span class="gp">In [143]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp"> .....:</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="mi">4</span><span class="p">,:]</span> <span class="o">=</span> <span class="p">[[</span><span class="s2">"c"</span><span class="p">,</span><span class="mi">3</span><span class="p">],[</span><span class="s2">"c"</span><span class="p">,</span><span class="mi">3</span><span class="p">]]</span>
<span class="gp"> .....:</span> <span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp"> .....:</span> <span class="k">print</span><span class="p">(</span><span class="s2">"ValueError: "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">))</span>
<span class="gp"> .....:</span>
<span class="go">ValueError: Cannot setitem on a Categorical with a new category, set the categories first</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-206">通过指定分类数据设置值还将检查<cite>类别</cite>是否匹配:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [144]: </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s2">"j"</span><span class="p">:</span><span class="s2">"k"</span><span class="p">,</span><span class="s2">"cats"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">])</span>
<span class="gp">In [145]: </span><span class="n">df</span>
<span class="gr">Out[145]: </span>
<span class="go"> cats values</span>
<span class="go">h a 1</span>
<span class="go">i a 1</span>
<span class="go">j a 2</span>
<span class="go">k a 2</span>
<span class="go">l a 1</span>
<span class="go">m a 1</span>
<span class="go">n a 1</span>
<span class="gp">In [146]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp"> .....:</span> <span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s2">"j"</span><span class="p">:</span><span class="s2">"k"</span><span class="p">,</span><span class="s2">"cats"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"c"</span><span class="p">])</span>
<span class="gp"> .....:</span> <span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="gp"> .....:</span> <span class="k">print</span><span class="p">(</span><span class="s2">"ValueError: "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">))</span>
<span class="gp"> .....:</span>
<span class="go">ValueError: Cannot set a Categorical with another, without identical categories</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-207">为其他类型的列的某些部分分配<cite>分类</cite>将使用以下值:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [147]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"a"</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="s2">"b"</span><span class="p">:[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"a"</span><span class="p">]})</span>
<span class="gp">In [148]: </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="s2">"a"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">])</span>
<span class="gp">In [149]: </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="mi">3</span><span class="p">,</span><span class="s2">"b"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Categorical</span><span class="p">([</span><span class="s2">"b"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">],</span> <span class="n">categories</span><span class="o">=</span><span class="p">[</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">])</span>
<span class="gp">In [150]: </span><span class="n">df</span>
<span class="gr">Out[150]: </span>
<span class="go"> a b</span>
<span class="go">0 1 a</span>
<span class="go">1 b a</span>
<span class="go">2 b b</span>
<span class="go">3 1 b</span>
<span class="go">4 1 a</span>
<span class="gp">In [151]: </span><span class="n">df</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[151]: </span>
<span class="go">a object</span>
<span class="go">b object</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
</div>
<div class="section" id="merging">
<h3><span class="yiyi-st" id="yiyi-208">Merging</span></h3>
<p><span class="yiyi-st" id="yiyi-209">您可以将包含分类数据的两个<cite>DataFrames</cite>合并在一起,但这些分类的类别需要相同:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [152]: </span><span class="n">cat</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s2">"a"</span><span class="p">,</span><span class="s2">"b"</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s2">"category"</span><span class="p">)</span>
<span class="gp">In [153]: </span><span class="n">vals</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">]</span>
<span class="gp">In [154]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">"cats"</span><span class="p">:</span><span class="n">cat</span><span class="p">,</span> <span class="s2">"vals"</span><span class="p">:</span><span class="n">vals</span><span class="p">})</span>
<span class="gp">In [155]: </span><span class="n">res</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">df</span><span class="p">,</span><span class="n">df</span><span class="p">])</span>
<span class="gp">In [156]: </span><span class="n">res</span>
<span class="gr">Out[156]: </span>
<span class="go"> cats vals</span>
<span class="go">0 a 1</span>
<span class="go">1 b 2</span>
<span class="go">0 a 1</span>
<span class="go">1 b 2</span>
<span class="gp">In [157]: </span><span class="n">res</span><span class="o">.</span><span class="n">dtypes</span>
<span class="gr">Out[157]: </span>
<span class="go">cats category</span>
<span class="go">vals int64</span>
<span class="go">dtype: object</span>
</pre></div>
</div>
<p><span class="yiyi-st" id="yiyi-210">在这种情况下,类别不一样,因此会出现错误:</span></p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [158]: </span><span class="n">df_different</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="gp">In [159]: </span><span class="n">df_different</span><span class="p">[</span><span class="s2">"cats"</span><span class="p">]</span><span class="o">.</span><span class="n">cat</span><span class="o">.</span><span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"c"</span><span class="p">,</span><span class="s2">"d"</span><span class="p">]</span>
<span class="gp">In [160]: </span><span class="k">try</span><span class="p">:</span>
<span class="gp"> .....:</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">df</span><span class="p">,</span><span class="n">df_different</span><span class="p">])</span>
<span class="gp"> .....:</span> <span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>