[PYTHON- 라이브러리]★PDF to DataFrame★

250x250

[PYTHON- 라이브러리]★PDF to DataFrame★

2024. 4. 11. 15:08

728x90

1. PDF 파일내 표를 DataFrame으로 변환하고자 한다.

from tabula import read_pdf
import pandas as pd

def read_pdf_table_to_dataframe(pdf_path, page_number):
    # tabula-py can only read tables from a PDF, ensure your PDF contains tables.
    df_list = read_pdf(pdf_path, pages=page_number, multiple_tables=True)



    # read_pdf returns a list of DataFrames, so concatenate them if there are multiple tables.
    df = pd.concat(df_list, ignore_index=True) if df_list else pd.DataFrame()

    return df

# Specify the path to your PDF, and the page number you want to extract the table from.
pdf_path = '젠톡_김지은_2.pdf'  # Change to your PDF file path.
page_number = '8'  # Change to your specific page number.

# Call the function and get the DataFrame.
df = read_pdf_table_to_dataframe(pdf_path, page_number)

# Now you can work with the DataFrame.
print(df.head())

① tabula-py를 pip install 한다.(2.9.0) , tabula를 install 하면 안된다.

②

해당 에러가 뜰경우

pip install JPype1

728x90

'PYTHON-기초통계 > PYTHON 라이브러리' 카테고리의 다른 글

[PYTHON- 기초통계 -02]★데이터프레임 추출★ (0)	2023.01.20
[PYTHON- 기초통계 -01]★클래스와 인스턴스★유용한 라이브러리★numpy활용★벡터 기본연산★결측값 (1)	2023.01.19

뭐든지 다 알아보자

Menu

Category

Notice

Recent comments

Links

[PYTHON- 라이브러리]★PDF to DataFrame★

1. PDF 파일내 표를 DataFrame으로 변환하고자 한다.

'PYTHON-기초통계 > PYTHON 라이브러리' 카테고리의 다른 글

+ Recent posts

티스토리툴바