Jak odstranit 50% řádků, které mají určitou hodnotu sloupce

Question 1

df.groupby(['target']).count()

Cíl	údaje
Negativní	103210
Pozitivní	211082

Právě teď, můj pozitivní dat je příliš velký. Chci odstranit 50% řádky, jejichž hodnota v Target sloupec je Positive. Jak mohu udělat to?

Question 2

Aby polovina Positive řádky, sample 50% Positive řádky pomocí frac=0.5 a drop tyto indexy:

indexes = df[df.target == 'Positive'].sample(frac=0.5).index
df = df.drop(indexes)

Aby přesně 100K Positive řádky, sample 100K Positive řádky pomocí n=100_000 a concat je s Negative řádky:

df = pd.concat([
    df[df.target == 'Negative'],
    df[df.target == 'Positive'].sample(n=100_000)
])

tdy · Answer 1 · 2021-11-24T04:27:20

Aby polovina Positive řádky, sample 50% Positive řádky pomocí frac=0.5 a drop tyto indexy:

indexes = df[df.target == 'Positive'].sample(frac=0.5).index
df = df.drop(indexes)

Aby přesně 100K Positive řádky, sample 100K Positive řádky pomocí n=100_000 a concat je s Negative řádky:

df = pd.concat([
    df[df.target == 'Negative'],
    df[df.target == 'Positive'].sample(n=100_000)
])

Otázka