What we're after:
_As a data user, I want to be able to view independent expenditure filings and know which filing is the most recent version, especially if the filing was amended._
We can achieve clarity by de-duping transaction filings for independent expenditures in processed and raw data and have the option to show most recent version of IEs. Raw data is higher priority since it's real time data.
@PaulClark2 These are all the places I could find where we provide independent expenditure data. Please let me know if you think I'm missing anything. Do you think all of these places are part of this deduping work? Or at this time are we just focusing on the IE datatable for processed and raw?
Of the places on .gov (listed in the previous comment) that currently present IE data, only the IE data table currently displays 24- and 48-hour report transactions. So that is where we will focus our attention at this time.
Beyond that, we need to get clarification on whether we want to tackle what's happening in RAW. While the Processed data is split between 24- and 48-hour reports and regularly scheduled reports, we do not make that distinction for raw transaction data. Is raw data something we want to tackle at this point? Or are we focusing on the 24- and 48-hour reports for Processed data?
@PaulClark2 Do you mind clarifying our intention for this work? Processed and raw?
cc: @lbeaufort
@JonellaCulmer I think I have a complication. I'm making sure I understand how the data is updated. I'll write more tomorrow.
This is the sql we use for the IE data we present in the Data Catalog on classic.gov. While reviewing the sql I notice we are incorrectly excluding some paper filed IEs.
SELECT a.CAN_ID,
a.CAND_NAM,
a.SPE_ID,
a.SPE_NAM,
a.ELE_TYP,
a.CAN_OFF_STA,
-- a.CAN_OFF_DIS,
(CASE
WHEN a.CAN_OFF_STA IN ('AK',
'DE',
'MT',
'ND',
'SD',
'VT',
'WY',
'AS',
'DC',
'GU',
'MP',
'PR',
'VI')
AND a.CAN_OFF = 'H'
THEN
'00'
WHEN a.CAN_OFF IN ('S', 'P')
THEN
'00'
ELSE
a.CAN_OFF_DIS
END)
CAN_OFF_DIS,
a.CAN_OFF,
c.pty_desc CAND_PTY_AFFILIATION,
a.EXP_AMO,
a.EXP_DAT,
a.AGG_AMO,
a.SUP_OPP,
a.PUR,
a.PAY,
a.FILE_NUM,
a.AMNDT_IND,
a.TRAN_ID,
a.IMAGE_NUM,
a.RECEIPT_DT,
a.FEC_ELECTION_YR,
a.PREVID,
a.DISSEM_DT
FROM (SELECT so_canid can_id,
so_can_name
|| CASE WHEN so_can_name IS NULL THEN NULL ELSE ', ' END
|| so_can_fname
cand_nam,
f57.comid spe_id,
f5.com_name spe_nam,
SUBSTR (item_elect_cd, 1, 1) ele_typ,
so_can_state can_off_sta,
so_can_dist can_off_dis,
so_can_off can_off,
amount exp_amo,
exp_date exp_dat,
ytd agg_amo,
supop sup_opp,
exp_desc pur,
f57.lname
|| (CASE
WHEN f57.fname IS NULL THEN NULL
ELSE ', ' || f57.fname
END)
pay,
f5.repid file_num,
CASE
WHEN reps.rptnum = 0 THEN 'N'
ELSE 'A' || TO_CHAR (REPS.RPTNUM)
END
AMNDT_IND,
tran_id,
f57.imageno image_num,
reps.filed_date receipt_dt,
CASE
WHEN f57.exp_date IS NOT NULL
THEN
CASE
WHEN MOD (TO_CHAR (f57.exp_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (f57.exp_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (f57.exp_date, 'YYYY')) + 1
END
ELSE
CASE
WHEN MOD (TO_CHAR (reps.filed_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY')) + 1
END
END
fec_election_yr,
reps.PREVID,
NULL AS DISSEM_DT
FROM real_efile.f57@realarch f57,
real_efile.f5@realarch,
real_efile.reps@realarch
WHERE f57.repid = f5.repid
AND reps.repid = f5.repid
AND f5.h_code IN (24, 48)
AND NVL (f5.through_date, reps.filed_date) >= '01-jan-2009'
AND reps.filed_date >= '01-jan-2009'
AND reps.EXCLUDE_IND IS NULL
UNION
SELECT NULL can_id,
so_last
|| ', '
|| so_first
|| CASE
WHEN so_middle IS NULL THEN NULL
ELSE ' ' || so_middle
END
cand_nam,
f57.comid spe_id,
pf5.name spe_nam,
SUBSTR (elc_code, 1, 1) ele_typ,
so_state can_off_sta,
so_dist can_off_dis,
so_office can_off,
amount exp_amo,
date_exp exp_dat,
cal_ytd agg_amo,
sup_opp,
purp_exp pur,
f57.LAST
|| (CASE
WHEN f57.FIRST IS NULL THEN NULL
ELSE ', ' || f57.FIRST
END)
pay,
pf5.repid file_num,
CASE WHEN ppreps.FORM LIKE '%N' THEN 'N' ELSE 'A' END
AMNDT_IND,
tran_id,
TO_NUMBER (f57.imgno) image_num,
ppreps.date_filed receipt_dt,
CASE
WHEN f57.date_exp IS NOT NULL
THEN
CASE
WHEN MOD (TO_CHAR (f57.date_exp, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (f57.date_exp, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (f57.date_exp, 'YYYY')) + 1
END
ELSE
CASE
WHEN MOD (TO_CHAR (ppreps.date_filed, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (ppreps.date_filed, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (ppreps.date_filed, 'YYYY'))
+ 1
END
END
fec_election_yr,
ppreps.PREVID,
NULL AS DISSEM_DT
FROM real_pfile.pf57@realarch f57,
real_pfile.pf5@realarch pf5,
real_pfile.ppreps@realarch
WHERE f57.repid = pf5.repid
AND ppreps.repid = pf5.repid
AND Pf5.hr_code IN (24, 48)
AND NVL (pf5.date_through, ppreps.received) >= '01-jan-2009'
AND ppreps.received >= '01-jan-2009'
AND ppreps.EXCLUDE_IND IS NULL
UNION
SELECT so_canid can_id,
so_can_name
|| CASE WHEN so_can_name IS NULL THEN NULL ELSE ', ' END
|| so_fname
cand_nam,
se.comid spe_id,
f24.name spe_nam,
SUBSTR (item_elect_cd, 1, 1) ele_typ,
so_can_state can_off_sta,
so_can_dist can_off_dis,
so_can_off can_off,
amount exp_amo,
t_date exp_dat,
ytd agg_amo,
position sup_opp,
transdesc pur,
se.lname
|| (CASE
WHEN se.fname IS NULL THEN NULL
ELSE ', ' || se.fname
END)
pay,
f24.repid file_num,
CASE
WHEN reps.rptnum = 0 THEN 'N'
ELSE 'A' || TO_CHAR (REPS.RPTNUM)
END
AMNDT_IND,
tran_id,
se.imageno image_num,
reps.filed_date receipt_dt,
-- case when mod(to_char(se.t_date,'YYYY'),2)=0 then to_number(to_char(se.t_date,'YYYY'))
-- else to_number(to_char(se.t_date,'YYYY'))+1 end fec_election_yr
CASE
WHEN se.t_date IS NOT NULL
THEN
CASE
WHEN MOD (TO_CHAR (se.t_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (se.t_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (se.t_date, 'YYYY')) + 1
END
ELSE
CASE
WHEN MOD (TO_CHAR (reps.filed_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY')) + 1
END
END
fec_election_yr,
reps.PREVID,
NULL AS DISSEM_DT
FROM real_efile.se@realarch se,
real_efile.f24@realarch,
real_efile.reps@realarch
WHERE se.repid = f24.repid
AND reps.repid = f24.repid
AND f24.rpttype IN (24, 48)
AND NVL (se.t_date, f24.sign_date) >= '01-jan-2009'
AND reps.EXCLUDE_IND IS NULL
UNION
--Now get the data from the current tables
SELECT so_canid can_id,
so_can_name
|| CASE WHEN so_can_name IS NULL THEN NULL ELSE ', ' END
|| so_can_fname
cand_nam,
f57.comid spe_id,
f5.com_name spe_nam,
SUBSTR (item_elect_cd, 1, 1) ele_typ,
so_can_state can_off_sta,
so_can_dist can_off_dis,
so_can_off can_off,
amount exp_amo,
exp_date exp_dat,
ytd agg_amo,
supop sup_opp,
exp_desc pur,
f57.lname
|| (CASE
WHEN f57.fname IS NULL THEN NULL
ELSE ', ' || f57.fname
END)
pay,
f5.repid file_num,
CASE
WHEN reps.rptnum = 0 THEN 'N'
ELSE 'A' || TO_CHAR (REPS.RPTNUM)
END
AMNDT_IND,
tran_id,
f57.imageno image_num,
reps.filed_date receipt_dt,
CASE
WHEN f57.exp_date IS NOT NULL
THEN
CASE
WHEN MOD (TO_CHAR (f57.exp_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (f57.exp_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (f57.exp_date, 'YYYY')) + 1
END
ELSE
CASE
WHEN MOD (TO_CHAR (reps.filed_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY')) + 1
END
END
fec_election_yr,
reps.PREVID,
NULL AS DISSEM_DT
FROM real_efile.f57 f57, real_efile.f5, real_efile.reps
WHERE f57.repid = f5.repid
AND reps.repid = f5.repid
AND f5.h_code IN (24, 48)
AND NVL (f5.through_date, reps.filed_date) >= '01-jan-2009'
AND reps.EXCLUDE_IND IS NULL
UNION
SELECT NULL can_id,
so_last
|| ', '
|| so_first
|| CASE
WHEN so_middle IS NULL THEN NULL
ELSE ' ' || so_middle
END
cand_nam,
f57.comid spe_id,
pf5.name spe_nam,
SUBSTR (elc_code, 1, 1) ele_typ,
so_state can_off_sta,
so_dist can_off_dis,
so_office can_off,
amount exp_amo,
date_exp exp_dat,
cal_ytd agg_amo,
sup_opp,
purp_exp pur,
f57.LAST
|| (CASE
WHEN f57.FIRST IS NULL THEN NULL
ELSE ', ' || f57.FIRST
END)
pay,
pf5.repid file_num,
CASE WHEN ppreps.FORM LIKE '%N' THEN 'N' ELSE 'A' END
AMNDT_IND,
tran_id,
TO_NUMBER (f57.imgno) image_num,
ppreps.date_filed receipt_dt,
CASE
WHEN f57.date_exp IS NOT NULL
THEN
CASE
WHEN MOD (TO_CHAR (f57.date_exp, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (f57.date_exp, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (f57.date_exp, 'YYYY')) + 1
END
ELSE
CASE
WHEN MOD (TO_CHAR (ppreps.date_filed, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (ppreps.date_filed, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (ppreps.date_filed, 'YYYY'))
+ 1
END
END
fec_election_yr,
ppreps.PREVID,
NULL AS DISSEM_DT
FROM real_pfile.pf57 f57, real_pfile.pf5 pf5, real_pfile.ppreps
WHERE f57.repid = pf5.repid
AND ppreps.repid = pf5.repid
AND Pf5.hr_code IN (24, 48)
AND NVL (pf5.date_through, ppreps.received) >= '01-jan-2009'
AND ppreps.EXCLUDE_IND IS NULL
UNION
SELECT so_canid can_id,
so_can_name
|| CASE WHEN so_can_name IS NULL THEN NULL ELSE ', ' END
|| so_fname
cand_nam,
se.comid spe_id,
f24.name spe_nam,
SUBSTR (item_elect_cd, 1, 1) ele_typ,
so_can_state can_off_sta,
so_can_dist can_off_dis,
so_can_off can_off,
amount exp_amo,
t_date exp_dat,
ytd agg_amo,
position sup_opp,
transdesc pur,
se.lname
|| (CASE
WHEN se.fname IS NULL THEN NULL
ELSE ', ' || se.fname
END)
pay,
f24.repid file_num,
CASE
WHEN reps.rptnum = 0 THEN 'N'
ELSE 'A' || TO_CHAR (REPS.RPTNUM)
END
AMNDT_IND,
tran_id,
se.imageno image_num,
reps.filed_date receipt_dt,
-- case when mod(to_char(se.t_date,'YYYY'),2)=0 then to_number(to_char(se.t_date,'YYYY')) else to_number(to_char(se.t_date,'YYYY'))+1 end fec_election_yr
CASE
WHEN se.t_date IS NOT NULL
THEN
CASE
WHEN MOD (TO_CHAR (se.t_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (se.t_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (se.t_date, 'YYYY')) + 1
END
ELSE
CASE
WHEN MOD (TO_CHAR (reps.filed_date, 'YYYY'), 2) = 0
THEN
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY'))
ELSE
TO_NUMBER (TO_CHAR (reps.filed_date, 'YYYY')) + 1
END
END
fec_election_yr,
reps.PREVID,
se.DISSEM_DT
FROM real_efile.se se, real_efile.f24, real_efile.reps
WHERE se.repid = f24.repid
AND reps.repid = f24.repid
AND f24.rpttype IN (24, 48)
AND NVL (se.t_date, f24.sign_date) >= '01-jan-2009'
AND reps.EXCLUDE_IND IS NULL
ORDER BY file_num) a,
disclosure.cand_valid_fec_yr b,
staging.ref_pty c
WHERE a.can_id = b.cand_id(+)
AND a.fec_election_yr = b.fec_election_yr(+)
AND b.cand_pty_affiliation = c.pty_cd(+);
@AmyKort @JonellaCulmer @fecjjeng @lbeaufort @jason-upchurch @hcaofec
The request from Press and Public Records was for an as-amended set of independent expenditure (IE) transactions. Something like the IE data set you can find in the classic site's Data Catalog but as-amended, or de-duped.
The classic site's Data Catalog IE data set is fed by the following tables (see sql in comment above):
IEs reported by persons other than political committees
These tables include both electronically and paper filed 24- and 48-hour IE transactions. To display all 24- and 48-hour IE transactions from 2009 through current we will need to include transactions from real_pfile.se. The new set of tables would include these:
IEs reported by PACs and parties
ADD to union (contains 2009-current data)
No F24 data in real_pfile archive (no change – not included in current sql)
IEs reported by persons other than political committees
DROP from union (most recent data from 2008)
My understanding is we do not have an endpoint that can supply these data. In addition to making sure we have an endpoint and the correct tables feeding the endpoint, we need to build the "as-amended" logic.
Filter options should include
@JonellaCulmer ideally it's work on raw data and processed data.
@PaulClark2 per our discussion:
My understanding is we do not have an endpoint that can supply these data.
I think the existing processed 24/28 hour IE data (/schedules/schedule_e/) contains paper filings, and raw efiling IE datatable (/schedules/schedules_e/efile/) has 24/48/quarterly IE transactions.
the MV would need to be refreshed multiple/many times a day
If we keep the processed (paper) and raw efile data in separate data tables (and separate endpoints) we will be able to keep the raw data "instant" (no more than 5 minutes old) because it's fed by a View, which doesn't need to get refreshed. The trade-off is that (like we do for other data) if you want to see both paper and raw efiling data, you'll need to do two exports.
confirm are only including IE transactions filed on 24- and 48-hour reports
For now, an MVP state could be only including the "most recent" functionality for 24/48 raw IE transactions. Front end/Design work would need to add a toggle to raw like we do for processed between 24/48 and quarterly.
confirm we are including all sources paper and electronic F24/SE and F5/F57
We confirmed there are paper filing 24/48 IEs in the processed IE data table and 24/48 and quarterly in the efile IE datatable
Raw filings API work (https://github.com/fecgov/openFEC/issues/3813)
We'll need an API ticket to create additional filters listed above for the raw IE transactions (/schedules/schedule_e/efile/.
Paper filings only API work (#https://github.com/fecgov/openFEC/issues/3455)
One thing we found is that the "most recent" logic has been broken for paper files a while - per Paul we'll focus on raw filings as our MVP
Filter options should include
- [ ] Spender name
- [ ] Candidate name or ID
- [ ] Support/Oppose
- [ ] Candidate office state
- [ ] Candidate office district
- [ ] Candidate party
- [ ] Candidate office
- [ ] Filing date (F24 or F5 filing date)
- [ ] Expenditure date
- [ ] (maybe) Dissemination date
@PaulClark2 For which menu are these filter options needed? I'm assuming raw? Although, processed, at present, does not have the filing or dissemination date filters.
Below are a couple mockups of the IE deduping fix that I want to pose to the team for comment and feedback. In addition to the new filters we need to add to the Raw panel (see previous comments), we need to give users a way to filter by Most recent transactions. While I want to make it similar to the other "Most recent" filter used on the filings datatable, there are still some things we need to consider/discuss.
Below is the filter panel. It includes the new filters. Similar to the processed panel, do we need to include an option for users to choose a specific report that was filed when looking at the regularly scheduled reports? Currently that is grayed out in the mockup. Most recent filter is at the bottom of the panel. Lastly, I'm trying to keep it simple at this point - so I have the single checkbox. But we need to be able to explain what is included in the Most recent category, such as unknowns. So there is a tooltip that we can use to explain what's included.

Below is the flyout panel that includes the addition of the "Most recent version" language. This mockup is not as fleshed out as I'd like it to be. The reason being, is we include a button that says "Open original image". That can cause some confusion to show a button that says "original" and "most recent version" at the same time. I'm wondering if we're able to change the button text depending on what version we're looking at. Or does "original" mean something specific?

cc: @PaulClark2 @lbeaufort @AmyKort
@JonellaCulmer These look good to me. I'm interested in what @PaulClark2 has to say, since he's a lot better SME than I am. I do agree with you about the "original" language. Is it true if it just said "view image" or something?
and by true I think I meant "accurate."
@JonellaCulmer these look good to me. I think we should change the text on the view image button to view image. Including the word original might make users think they are looking at the image of the original report and not the most recent report.
Sounds good. Updated versioning mock-up below.

cc: @PaulClark2 @AmyKort
Looks good!
@PaulClark2 @AmyKort Okay, great! Still a couple things to address before moving this to an implementation ticket (while we wait for the back-end work to be done).
1. What does the tooltip message need to say.
Need to explain that unknown versions are included with most recent versions.
2. Do we need to include the filter for the regularly scheduled report form type?
Do we have enough information in the Raw to be able to distinguish between the reports so that this
filter can function?
cc: @lbeaufort
Based on a conversation with @PaulClark2 the filter panel has been updated to reflect the inclusion of the regularly scheduled report filter. However, its use is dependent on the filter above it. So it will remain grayed out until the user chooses to search Regularly scheduled reports using the toggle. We will also need to add this functionality to the processed panel.
Mock-up:

@dorothyyeager @kathycarothers @bmathesonFEC
Additionally, we need content help with the tooltip language. When I user filters by "Most recent version" they will also see Unknowns. So we need to address that in the tooltip. Below is what we have drafted so far.
"Most recent version also includes transactions where the version is unknown."
Depending on what the language says, we may want to change the formatting for the "Version unknown" language in the flyout. See flyout panel mockup in previous comment.
@lbeaufort @PaulClark2 We talked through the language for the version on the IE datatable. Let us know if you have any concerns with what was drafted below. Would love additional clarity on how we arrive at "unknown" transactions.
Update flyout panel version text to say: Unknown version
Draft tooltip language: "We are unable to determine the versioning for some transactions. These unknown transactions are included when filtering by most recent version."
cc: @dorothyyeager @bmathesonFEC @kathycarothers
@JonellaCulmer the draft tooltip language is accurate, and it makes sense to me. I'll deffer to the content folks for improvements.
Below are the final mockups for the IE deduping work


Closing in favor of front-end implementation work: https://github.com/fecgov/fec-cms/issues/3011
Most helpful comment
@JonellaCulmer these look good to me. I think we should change the text on the view image button to
view image. Including the wordoriginalmight make users think they are looking at the image of the original report and not the most recent report.